* [PATCH] usb: gadget: uvc: fix dropped frame after missed isoc
[not found] <20221017205446.523796-1-w36195@motorola.com>
@ 2022-10-17 20:54 ` Dan Vacura
2022-10-18 1:50 ` Bagas Sanjaya
2022-10-17 20:54 ` [PATCH v3 2/6] usb: dwc3: gadget: cancel requests instead of release " Dan Vacura
` (2 subsequent siblings)
3 siblings, 1 reply; 19+ messages in thread
From: Dan Vacura @ 2022-10-17 20:54 UTC (permalink / raw)
To: linux-usb
Cc: Daniel Scally, Thinh Nguyen, Jeff Vanhoof, Dan Vacura, stable,
Greg Kroah-Hartman, Jonathan Corbet, Laurent Pinchart,
Felipe Balbi, Michael Grzeschik, Paul Elder, linux-kernel,
linux-doc
With the re-use of the previous completion status in 0d1c407b1a749
("usb: dwc3: gadget: Return proper request status") it could be possible
that the next frame would also get dropped if the current frame has a
missed isoc error. Ensure that an interrupt is requested for the start
of a new frame.
Fixes: fc78941d8169 ("usb: gadget: uvc: decrease the interrupt load to a quarter")
Cc: <stable@vger.kernel.org>
Signed-off-by: Dan Vacura <w36195@motorola.com>
---
V1 -> V3:
- no change, new patch in series
drivers/usb/gadget/function/uvc_video.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/usb/gadget/function/uvc_video.c b/drivers/usb/gadget/function/uvc_video.c
index bb037fcc90e6..323977716f5a 100644
--- a/drivers/usb/gadget/function/uvc_video.c
+++ b/drivers/usb/gadget/function/uvc_video.c
@@ -431,7 +431,8 @@ static void uvcg_video_pump(struct work_struct *work)
/* Endpoint now owns the request */
req = NULL;
- video->req_int_count++;
+ if (buf->state != UVC_BUF_STATE_DONE)
+ video->req_int_count++;
}
if (!req)
--
2.34.1
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH v3 2/6] usb: dwc3: gadget: cancel requests instead of release after missed isoc
[not found] <20221017205446.523796-1-w36195@motorola.com>
2022-10-17 20:54 ` [PATCH] usb: gadget: uvc: fix dropped frame after missed isoc Dan Vacura
@ 2022-10-17 20:54 ` Dan Vacura
2022-10-17 21:30 ` Thinh Nguyen
2024-02-22 0:02 ` Michael Grzeschik
2022-10-17 20:54 ` [PATCH v3 3/6] usb: gadget: uvc: fix sg handling in error case Dan Vacura
2022-10-17 20:54 ` [PATCH v3 4/6] usb: gadget: uvc: fix sg handling during video encode Dan Vacura
3 siblings, 2 replies; 19+ messages in thread
From: Dan Vacura @ 2022-10-17 20:54 UTC (permalink / raw)
To: linux-usb
Cc: Daniel Scally, Thinh Nguyen, Jeff Vanhoof, stable, Dan Vacura,
Greg Kroah-Hartman, Jonathan Corbet, Laurent Pinchart,
Felipe Balbi, Paul Elder, Michael Grzeschik, linux-kernel,
linux-doc
From: Jeff Vanhoof <qjv001@motorola.com>
arm-smmu related crashes seen after a Missed ISOC interrupt when
no_interrupt=1 is used. This can happen if the hardware is still using
the data associated with a TRB after the usb_request's ->complete call
has been made. Instead of immediately releasing a request when a Missed
ISOC interrupt has occurred, this change will add logic to cancel the
request instead where it will eventually be released when the
END_TRANSFER command has completed. This logic is similar to some of the
cleanup done in dwc3_gadget_ep_dequeue.
Fixes: 6d8a019614f3 ("usb: dwc3: gadget: check for Missed Isoc from event status")
Cc: <stable@vger.kernel.org>
Signed-off-by: Jeff Vanhoof <qjv001@motorola.com>
Co-developed-by: Dan Vacura <w36195@motorola.com>
Signed-off-by: Dan Vacura <w36195@motorola.com>
---
V1 -> V3:
- no change, new patch in series
drivers/usb/dwc3/core.h | 1 +
drivers/usb/dwc3/gadget.c | 38 ++++++++++++++++++++++++++------------
2 files changed, 27 insertions(+), 12 deletions(-)
diff --git a/drivers/usb/dwc3/core.h b/drivers/usb/dwc3/core.h
index 8f9959ba9fd4..9b005d912241 100644
--- a/drivers/usb/dwc3/core.h
+++ b/drivers/usb/dwc3/core.h
@@ -943,6 +943,7 @@ struct dwc3_request {
#define DWC3_REQUEST_STATUS_DEQUEUED 3
#define DWC3_REQUEST_STATUS_STALLED 4
#define DWC3_REQUEST_STATUS_COMPLETED 5
+#define DWC3_REQUEST_STATUS_MISSED_ISOC 6
#define DWC3_REQUEST_STATUS_UNKNOWN -1
u8 epnum;
diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index 079cd333632e..411532c5c378 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -2021,6 +2021,9 @@ static void dwc3_gadget_ep_cleanup_cancelled_requests(struct dwc3_ep *dep)
case DWC3_REQUEST_STATUS_STALLED:
dwc3_gadget_giveback(dep, req, -EPIPE);
break;
+ case DWC3_REQUEST_STATUS_MISSED_ISOC:
+ dwc3_gadget_giveback(dep, req, -EXDEV);
+ break;
default:
dev_err(dwc->dev, "request cancelled with wrong reason:%d\n", req->status);
dwc3_gadget_giveback(dep, req, -ECONNRESET);
@@ -3402,21 +3405,32 @@ static bool dwc3_gadget_endpoint_trbs_complete(struct dwc3_ep *dep,
struct dwc3 *dwc = dep->dwc;
bool no_started_trb = true;
- dwc3_gadget_ep_cleanup_completed_requests(dep, event, status);
+ if (status == -EXDEV) {
+ struct dwc3_request *tmp;
+ struct dwc3_request *req;
- if (dep->flags & DWC3_EP_END_TRANSFER_PENDING)
- goto out;
+ if (!(dep->flags & DWC3_EP_END_TRANSFER_PENDING))
+ dwc3_stop_active_transfer(dep, true, true);
- if (!dep->endpoint.desc)
- return no_started_trb;
+ list_for_each_entry_safe(req, tmp, &dep->started_list, list)
+ dwc3_gadget_move_cancelled_request(req,
+ DWC3_REQUEST_STATUS_MISSED_ISOC);
+ } else {
+ dwc3_gadget_ep_cleanup_completed_requests(dep, event, status);
- if (usb_endpoint_xfer_isoc(dep->endpoint.desc) &&
- list_empty(&dep->started_list) &&
- (list_empty(&dep->pending_list) || status == -EXDEV))
- dwc3_stop_active_transfer(dep, true, true);
- else if (dwc3_gadget_ep_should_continue(dep))
- if (__dwc3_gadget_kick_transfer(dep) == 0)
- no_started_trb = false;
+ if (dep->flags & DWC3_EP_END_TRANSFER_PENDING)
+ goto out;
+
+ if (!dep->endpoint.desc)
+ return no_started_trb;
+
+ if (usb_endpoint_xfer_isoc(dep->endpoint.desc) &&
+ list_empty(&dep->started_list) && list_empty(&dep->pending_list))
+ dwc3_stop_active_transfer(dep, true, true);
+ else if (dwc3_gadget_ep_should_continue(dep))
+ if (__dwc3_gadget_kick_transfer(dep) == 0)
+ no_started_trb = false;
+ }
out:
/*
--
2.34.1
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH v3 3/6] usb: gadget: uvc: fix sg handling in error case
[not found] <20221017205446.523796-1-w36195@motorola.com>
2022-10-17 20:54 ` [PATCH] usb: gadget: uvc: fix dropped frame after missed isoc Dan Vacura
2022-10-17 20:54 ` [PATCH v3 2/6] usb: dwc3: gadget: cancel requests instead of release " Dan Vacura
@ 2022-10-17 20:54 ` Dan Vacura
2022-10-17 20:54 ` [PATCH v3 4/6] usb: gadget: uvc: fix sg handling during video encode Dan Vacura
3 siblings, 0 replies; 19+ messages in thread
From: Dan Vacura @ 2022-10-17 20:54 UTC (permalink / raw)
To: linux-usb
Cc: Daniel Scally, Thinh Nguyen, Jeff Vanhoof, Dan Vacura, stable,
Greg Kroah-Hartman, Jonathan Corbet, Laurent Pinchart,
Felipe Balbi, Paul Elder, Michael Grzeschik, linux-kernel,
linux-doc
If there is a transmission error the buffer will be returned too early,
causing a memory fault as subsequent requests for that buffer are still
queued up to be sent. Refactor the error handling to wait for the final
request to come in before reporting back the buffer to userspace for all
transfer types (bulk/isoc/isoc_sg). This ensures userspace knows if the
frame was successfully sent.
Fixes: e81e7f9a0eb9 ("usb: gadget: uvc: add scatter gather support")
Cc: <stable@vger.kernel.org> # 859c675d84d4: usb: gadget: uvc: consistently use define for headerlen
Cc: <stable@vger.kernel.org> # f262ce66d40c: usb: gadget: uvc: use on returned header len in video_encode_isoc_sg
Cc: <stable@vger.kernel.org> # 61aa709ca58a: usb: gadget: uvc: rework uvcg_queue_next_buffer to uvcg_complete_buffer
Cc: <stable@vger.kernel.org> # 9b969f93bcef: usb: gadget: uvc: giveback vb2 buffer on req complete
Cc: <stable@vger.kernel.org> # aef11279888c: usb: gadget: uvc: improve sg exit condition
Cc: <stable@vger.kernel.org>
Signed-off-by: Dan Vacura <w36195@motorola.com>
---
V1 -> V2:
- undo error rename
- change uvcg_info to uvcg_dbg
V2 -> V3:
- no changes
drivers/usb/gadget/function/uvc_queue.c | 8 +++++---
drivers/usb/gadget/function/uvc_video.c | 18 ++++++++++++++----
2 files changed, 19 insertions(+), 7 deletions(-)
diff --git a/drivers/usb/gadget/function/uvc_queue.c b/drivers/usb/gadget/function/uvc_queue.c
index ec500ee499ee..0aa3d7e1f3cc 100644
--- a/drivers/usb/gadget/function/uvc_queue.c
+++ b/drivers/usb/gadget/function/uvc_queue.c
@@ -304,6 +304,7 @@ int uvcg_queue_enable(struct uvc_video_queue *queue, int enable)
queue->sequence = 0;
queue->buf_used = 0;
+ queue->flags &= ~UVC_QUEUE_DROP_INCOMPLETE;
} else {
ret = vb2_streamoff(&queue->queue, queue->queue.type);
if (ret < 0)
@@ -329,10 +330,11 @@ int uvcg_queue_enable(struct uvc_video_queue *queue, int enable)
void uvcg_complete_buffer(struct uvc_video_queue *queue,
struct uvc_buffer *buf)
{
- if ((queue->flags & UVC_QUEUE_DROP_INCOMPLETE) &&
- buf->length != buf->bytesused) {
- buf->state = UVC_BUF_STATE_QUEUED;
+ if (queue->flags & UVC_QUEUE_DROP_INCOMPLETE) {
+ queue->flags &= ~UVC_QUEUE_DROP_INCOMPLETE;
+ buf->state = UVC_BUF_STATE_ERROR;
vb2_set_plane_payload(&buf->buf.vb2_buf, 0, 0);
+ vb2_buffer_done(&buf->buf.vb2_buf, VB2_BUF_STATE_ERROR);
return;
}
diff --git a/drivers/usb/gadget/function/uvc_video.c b/drivers/usb/gadget/function/uvc_video.c
index 91a58567beac..dd54841b0b3e 100644
--- a/drivers/usb/gadget/function/uvc_video.c
+++ b/drivers/usb/gadget/function/uvc_video.c
@@ -88,6 +88,7 @@ uvc_video_encode_bulk(struct usb_request *req, struct uvc_video *video,
struct uvc_buffer *buf)
{
void *mem = req->buf;
+ struct uvc_request *ureq = req->context;
int len = video->req_size;
int ret;
@@ -113,13 +114,14 @@ uvc_video_encode_bulk(struct usb_request *req, struct uvc_video *video,
video->queue.buf_used = 0;
buf->state = UVC_BUF_STATE_DONE;
list_del(&buf->queue);
- uvcg_complete_buffer(&video->queue, buf);
video->fid ^= UVC_STREAM_FID;
+ ureq->last_buf = buf;
video->payload_size = 0;
}
if (video->payload_size == video->max_payload_size ||
+ video->queue.flags & UVC_QUEUE_DROP_INCOMPLETE ||
buf->bytesused == video->queue.buf_used)
video->payload_size = 0;
}
@@ -180,7 +182,8 @@ uvc_video_encode_isoc_sg(struct usb_request *req, struct uvc_video *video,
req->length -= len;
video->queue.buf_used += req->length - header_len;
- if (buf->bytesused == video->queue.buf_used || !buf->sg) {
+ if (buf->bytesused == video->queue.buf_used || !buf->sg ||
+ video->queue.flags & UVC_QUEUE_DROP_INCOMPLETE) {
video->queue.buf_used = 0;
buf->state = UVC_BUF_STATE_DONE;
buf->offset = 0;
@@ -195,6 +198,7 @@ uvc_video_encode_isoc(struct usb_request *req, struct uvc_video *video,
struct uvc_buffer *buf)
{
void *mem = req->buf;
+ struct uvc_request *ureq = req->context;
int len = video->req_size;
int ret;
@@ -209,12 +213,13 @@ uvc_video_encode_isoc(struct usb_request *req, struct uvc_video *video,
req->length = video->req_size - len;
- if (buf->bytesused == video->queue.buf_used) {
+ if (buf->bytesused == video->queue.buf_used ||
+ video->queue.flags & UVC_QUEUE_DROP_INCOMPLETE) {
video->queue.buf_used = 0;
buf->state = UVC_BUF_STATE_DONE;
list_del(&buf->queue);
- uvcg_complete_buffer(&video->queue, buf);
video->fid ^= UVC_STREAM_FID;
+ ureq->last_buf = buf;
}
}
@@ -255,6 +260,11 @@ uvc_video_complete(struct usb_ep *ep, struct usb_request *req)
case 0:
break;
+ case -EXDEV:
+ uvcg_dbg(&video->uvc->func, "VS request missed xfer.\n");
+ queue->flags |= UVC_QUEUE_DROP_INCOMPLETE;
+ break;
+
case -ESHUTDOWN: /* disconnect from host. */
uvcg_dbg(&video->uvc->func, "VS request cancelled.\n");
uvcg_queue_cancel(queue, 1);
--
2.34.1
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH v3 4/6] usb: gadget: uvc: fix sg handling during video encode
[not found] <20221017205446.523796-1-w36195@motorola.com>
` (2 preceding siblings ...)
2022-10-17 20:54 ` [PATCH v3 3/6] usb: gadget: uvc: fix sg handling in error case Dan Vacura
@ 2022-10-17 20:54 ` Dan Vacura
3 siblings, 0 replies; 19+ messages in thread
From: Dan Vacura @ 2022-10-17 20:54 UTC (permalink / raw)
To: linux-usb
Cc: Daniel Scally, Thinh Nguyen, Jeff Vanhoof, stable, Dan Vacura,
Greg Kroah-Hartman, Jonathan Corbet, Laurent Pinchart,
Felipe Balbi, Michael Grzeschik, Paul Elder, linux-kernel,
linux-doc
From: Jeff Vanhoof <qjv001@motorola.com>
In uvc_video_encode_isoc_sg, the uvc_request's sg list is
incorrectly being populated leading to corrupt video being
received by the remote end. When building the sg list the
usage of buf->sg's 'dma_length' field is not correct and
instead its 'length' field should be used.
Fixes: e81e7f9a0eb9 ("usb: gadget: uvc: add scatter gather support")
Cc: <stable@vger.kernel.org>
Signed-off-by: Jeff Vanhoof <qjv001@motorola.com>
Signed-off-by: Dan Vacura <w36195@motorola.com>
---
V1 -> V3:
- no change, new patch in series
drivers/usb/gadget/function/uvc_video.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/usb/gadget/function/uvc_video.c b/drivers/usb/gadget/function/uvc_video.c
index dd54841b0b3e..7d4508a83d5d 100644
--- a/drivers/usb/gadget/function/uvc_video.c
+++ b/drivers/usb/gadget/function/uvc_video.c
@@ -157,10 +157,10 @@ uvc_video_encode_isoc_sg(struct usb_request *req, struct uvc_video *video,
sg = sg_next(sg);
for_each_sg(sg, iter, ureq->sgt.nents - 1, i) {
- if (!len || !buf->sg || !sg_dma_len(buf->sg))
+ if (!len || !buf->sg || !buf->sg->length)
break;
- sg_left = sg_dma_len(buf->sg) - buf->offset;
+ sg_left = buf->sg->length - buf->offset;
part = min_t(unsigned int, len, sg_left);
sg_set_page(iter, sg_page(buf->sg), part, buf->offset);
--
2.34.1
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH v3 2/6] usb: dwc3: gadget: cancel requests instead of release after missed isoc
2022-10-17 20:54 ` [PATCH v3 2/6] usb: dwc3: gadget: cancel requests instead of release " Dan Vacura
@ 2022-10-17 21:30 ` Thinh Nguyen
2022-10-18 2:10 ` Dan Vacura
2024-02-22 0:02 ` Michael Grzeschik
1 sibling, 1 reply; 19+ messages in thread
From: Thinh Nguyen @ 2022-10-17 21:30 UTC (permalink / raw)
To: Dan Vacura
Cc: linux-usb@vger.kernel.org, Daniel Scally, Thinh Nguyen,
Jeff Vanhoof, stable@vger.kernel.org, Greg Kroah-Hartman,
Jonathan Corbet, Laurent Pinchart, Felipe Balbi, Paul Elder,
Michael Grzeschik, linux-kernel@vger.kernel.org,
linux-doc@vger.kernel.org
On Mon, Oct 17, 2022, Dan Vacura wrote:
> From: Jeff Vanhoof <qjv001@motorola.com>
>
> arm-smmu related crashes seen after a Missed ISOC interrupt when
> no_interrupt=1 is used. This can happen if the hardware is still using
> the data associated with a TRB after the usb_request's ->complete call
> has been made. Instead of immediately releasing a request when a Missed
> ISOC interrupt has occurred, this change will add logic to cancel the
> request instead where it will eventually be released when the
> END_TRANSFER command has completed. This logic is similar to some of the
> cleanup done in dwc3_gadget_ep_dequeue.
This doesn't sound right. How did you determine that the hardware is
still using the data associated with the TRB? Did you check the TRB's
HWO bit?
The dwc3 driver would only give back the requests if the TRBs of the
associated requests are completed or when the device is disconnected.
If the TRB indicated missed isoc, that means that the TRB is completed
and its status was updated.
There's a special case which dwc3 may give back requests early is the
case of the device disconnecting. The requests should be returned with
-ESHUTDOWN, and the gadget driver shouldn't be re-using the requests on
de-initialization anyway.
We should not issue End Transfer command just because of missed isoc. We
may want issue End Transfer if the gadget driver is too slow and unable
to feed requests in time (causing underrun and missed isoc) to resync
with the host, but we already handle that.
I'm still not clear what's the problem you're seeing. Do you have the
crash log? Tracepoints?
BR,
Thinh
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] usb: gadget: uvc: fix dropped frame after missed isoc
2022-10-17 20:54 ` [PATCH] usb: gadget: uvc: fix dropped frame after missed isoc Dan Vacura
@ 2022-10-18 1:50 ` Bagas Sanjaya
2022-10-18 2:15 ` Dan Vacura
0 siblings, 1 reply; 19+ messages in thread
From: Bagas Sanjaya @ 2022-10-18 1:50 UTC (permalink / raw)
To: Dan Vacura, linux-usb
Cc: Daniel Scally, Thinh Nguyen, Jeff Vanhoof, stable,
Greg Kroah-Hartman, Jonathan Corbet, Laurent Pinchart,
Felipe Balbi, Michael Grzeschik, Paul Elder, linux-kernel,
linux-doc
On 10/18/22 03:54, Dan Vacura wrote:
> With the re-use of the previous completion status in 0d1c407b1a749
> ("usb: dwc3: gadget: Return proper request status") it could be possible
> that the next frame would also get dropped if the current frame has a
> missed isoc error. Ensure that an interrupt is requested for the start
> of a new frame.
>
Shouldn't the subject line says [PATCH v3 1/6]?
--
An old man doll... just what I always wanted! - Clara
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v3 2/6] usb: dwc3: gadget: cancel requests instead of release after missed isoc
2022-10-17 21:30 ` Thinh Nguyen
@ 2022-10-18 2:10 ` Dan Vacura
2022-10-18 18:45 ` Thinh Nguyen
0 siblings, 1 reply; 19+ messages in thread
From: Dan Vacura @ 2022-10-18 2:10 UTC (permalink / raw)
To: Thinh Nguyen
Cc: linux-usb@vger.kernel.org, Daniel Scally, Jeff Vanhoof,
stable@vger.kernel.org, Greg Kroah-Hartman, Jonathan Corbet,
Laurent Pinchart, Felipe Balbi, Paul Elder, Michael Grzeschik,
linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org
Hi Thinh,
On Mon, Oct 17, 2022 at 09:30:38PM +0000, Thinh Nguyen wrote:
> On Mon, Oct 17, 2022, Dan Vacura wrote:
> > From: Jeff Vanhoof <qjv001@motorola.com>
> >
> > arm-smmu related crashes seen after a Missed ISOC interrupt when
> > no_interrupt=1 is used. This can happen if the hardware is still using
> > the data associated with a TRB after the usb_request's ->complete call
> > has been made. Instead of immediately releasing a request when a Missed
> > ISOC interrupt has occurred, this change will add logic to cancel the
> > request instead where it will eventually be released when the
> > END_TRANSFER command has completed. This logic is similar to some of the
> > cleanup done in dwc3_gadget_ep_dequeue.
>
> This doesn't sound right. How did you determine that the hardware is
> still using the data associated with the TRB? Did you check the TRB's
> HWO bit?
The problem we're seeing was mentioned in the summary of this patch
series, issue #1. Basically, with the following patch
https://patchwork.kernel.org/project/linux-usb/patch/20210628155311.16762-6-m.grzeschik@pengutronix.de/
integrated a smmu panic is occurring on our Android device with the 5.15
kernel which is:
<3>[ 718.314900][ T803] arm-smmu 15000000.apps-smmu: Unhandled arm-smmu context fault from a600000.dwc3!
The uvc gadget driver appears to be the first (and only) gadget that
uses the no_interrupt=1 logic, so this seems to be a new condition for
the dwc3 driver. In our configuration, we have up to 64 requests and the
no_interrupt=1 for up to 15 requests. The list size of dep->started_list
would get up to that amount when looping through to cleanup the
completed requests. From testing and debugging the smmu panic occurs
when a -EXDEV status shows up and right after
dwc3_gadget_ep_cleanup_completed_request() was visited. The conclusion
we had was the requests were getting returned to the gadget too early.
>
> The dwc3 driver would only give back the requests if the TRBs of the
> associated requests are completed or when the device is disconnected.
> If the TRB indicated missed isoc, that means that the TRB is completed
> and its status was updated.
Interesting, the device is not disconnected as we don't get the
-ESHUTDOWN status back and with this patch in place things continue
after a -EXDEV status is received.
>
> There's a special case which dwc3 may give back requests early is the
> case of the device disconnecting. The requests should be returned with
> -ESHUTDOWN, and the gadget driver shouldn't be re-using the requests on
> de-initialization anyway.
>
> We should not issue End Transfer command just because of missed isoc. We
> may want issue End Transfer if the gadget driver is too slow and unable
> to feed requests in time (causing underrun and missed isoc) to resync
> with the host, but we already handle that.
Hmm, isn't that what happens when we get into this
condition in dwc3_gadget_endpoint_trbs_complete():
if (usb_endpoint_xfer_isoc(dep->endpoint.desc) &&
list_empty(&dep->started_list) &&
(list_empty(&dep->pending_list) || status == -EXDEV))
dwc3_stop_active_transfer(dep, true, true);
>
> I'm still not clear what's the problem you're seeing. Do you have the
> crash log? Tracepoints?
>
> BR,
> Thinh
Appreciate the support!
Dan
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] usb: gadget: uvc: fix dropped frame after missed isoc
2022-10-18 1:50 ` Bagas Sanjaya
@ 2022-10-18 2:15 ` Dan Vacura
2022-10-18 5:13 ` Greg Kroah-Hartman
0 siblings, 1 reply; 19+ messages in thread
From: Dan Vacura @ 2022-10-18 2:15 UTC (permalink / raw)
To: Bagas Sanjaya
Cc: linux-usb, Daniel Scally, Thinh Nguyen, Jeff Vanhoof, stable,
Greg Kroah-Hartman, Jonathan Corbet, Laurent Pinchart,
Felipe Balbi, Michael Grzeschik, Paul Elder, linux-kernel,
linux-doc
On Tue, Oct 18, 2022 at 08:50:03AM +0700, Bagas Sanjaya wrote:
> On 10/18/22 03:54, Dan Vacura wrote:
> > With the re-use of the previous completion status in 0d1c407b1a749
> > ("usb: dwc3: gadget: Return proper request status") it could be possible
> > that the next frame would also get dropped if the current frame has a
> > missed isoc error. Ensure that an interrupt is requested for the start
> > of a new frame.
> >
>
> Shouldn't the subject line says [PATCH v3 1/6]?
Yes. Clerical error on my side not updating this after resolving a
check-patch error... Not sure if it matters as this patch can exist on
it's own. Or if I can send this again with fixed subject line, but that
may confuse others, since there's no code difference.
>
> --
> An old man doll... just what I always wanted! - Clara
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] usb: gadget: uvc: fix dropped frame after missed isoc
2022-10-18 2:15 ` Dan Vacura
@ 2022-10-18 5:13 ` Greg Kroah-Hartman
0 siblings, 0 replies; 19+ messages in thread
From: Greg Kroah-Hartman @ 2022-10-18 5:13 UTC (permalink / raw)
To: Dan Vacura
Cc: Bagas Sanjaya, linux-usb, Daniel Scally, Thinh Nguyen,
Jeff Vanhoof, stable, Jonathan Corbet, Laurent Pinchart,
Felipe Balbi, Michael Grzeschik, Paul Elder, linux-kernel,
linux-doc
On Mon, Oct 17, 2022 at 09:15:43PM -0500, Dan Vacura wrote:
> On Tue, Oct 18, 2022 at 08:50:03AM +0700, Bagas Sanjaya wrote:
> > On 10/18/22 03:54, Dan Vacura wrote:
> > > With the re-use of the previous completion status in 0d1c407b1a749
> > > ("usb: dwc3: gadget: Return proper request status") it could be possible
> > > that the next frame would also get dropped if the current frame has a
> > > missed isoc error. Ensure that an interrupt is requested for the start
> > > of a new frame.
> > >
> >
> > Shouldn't the subject line says [PATCH v3 1/6]?
>
> Yes. Clerical error on my side not updating this after resolving a
> check-patch error... Not sure if it matters as this patch can exist on
> it's own. Or if I can send this again with fixed subject line, but that
> may confuse others, since there's no code difference.
Our tools (b4) will complain it can not find patch 1 in the series, so
yes, please resend with them properly numbered so that we can find them
all when going to apply them to the tree.
thanks,
greg k-h
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v3 2/6] usb: dwc3: gadget: cancel requests instead of release after missed isoc
2022-10-18 2:10 ` Dan Vacura
@ 2022-10-18 18:45 ` Thinh Nguyen
2022-10-18 19:13 ` Michael Grzeschik
0 siblings, 1 reply; 19+ messages in thread
From: Thinh Nguyen @ 2022-10-18 18:45 UTC (permalink / raw)
To: Dan Vacura
Cc: Thinh Nguyen, linux-usb@vger.kernel.org, Daniel Scally,
Jeff Vanhoof, stable@vger.kernel.org, Greg Kroah-Hartman,
Jonathan Corbet, Laurent Pinchart, Felipe Balbi, Paul Elder,
Michael Grzeschik, linux-kernel@vger.kernel.org,
linux-doc@vger.kernel.org
Hi Dan,
On Mon, Oct 17, 2022, Dan Vacura wrote:
> Hi Thinh,
>
> On Mon, Oct 17, 2022 at 09:30:38PM +0000, Thinh Nguyen wrote:
> > On Mon, Oct 17, 2022, Dan Vacura wrote:
> > > From: Jeff Vanhoof <qjv001@motorola.com>
> > >
> > > arm-smmu related crashes seen after a Missed ISOC interrupt when
> > > no_interrupt=1 is used. This can happen if the hardware is still using
> > > the data associated with a TRB after the usb_request's ->complete call
> > > has been made. Instead of immediately releasing a request when a Missed
> > > ISOC interrupt has occurred, this change will add logic to cancel the
> > > request instead where it will eventually be released when the
> > > END_TRANSFER command has completed. This logic is similar to some of the
> > > cleanup done in dwc3_gadget_ep_dequeue.
> >
> > This doesn't sound right. How did you determine that the hardware is
> > still using the data associated with the TRB? Did you check the TRB's
> > HWO bit?
>
> The problem we're seeing was mentioned in the summary of this patch
> series, issue #1. Basically, with the following patch
> https://urldefense.com/v3/__https://patchwork.kernel.org/project/linux-usb/patch/20210628155311.16762-6-m.grzeschik@pengutronix.de/__;!!A4F2R9G_pg!aSNZ-IjMcPgL47A4NR5qp9qhVlP91UGTuCxej5NRTv8-FmTrMkKK7CjNToQQVEgtpqbKzLU2HXET9O226AEN$
> integrated a smmu panic is occurring on our Android device with the 5.15
> kernel which is:
>
> <3>[ 718.314900][ T803] arm-smmu 15000000.apps-smmu: Unhandled arm-smmu context fault from a600000.dwc3!
>
> The uvc gadget driver appears to be the first (and only) gadget that
> uses the no_interrupt=1 logic, so this seems to be a new condition for
> the dwc3 driver. In our configuration, we have up to 64 requests and the
> no_interrupt=1 for up to 15 requests. The list size of dep->started_list
> would get up to that amount when looping through to cleanup the
> completed requests. From testing and debugging the smmu panic occurs
> when a -EXDEV status shows up and right after
> dwc3_gadget_ep_cleanup_completed_request() was visited. The conclusion
> we had was the requests were getting returned to the gadget too early.
As I mentioned, if the status is updated to missed isoc, that means that
the controller returned ownership of the TRB to the driver. At least for
the particular request with -EXDEV, its TRBs are completed. I'm not
clear on your conclusion.
Do we know where did the crash occur? Is it from dwc3 driver or from uvc
driver, and at what line? It'd great if we can see the driver log.
>
> >
> > The dwc3 driver would only give back the requests if the TRBs of the
> > associated requests are completed or when the device is disconnected.
> > If the TRB indicated missed isoc, that means that the TRB is completed
> > and its status was updated.
>
> Interesting, the device is not disconnected as we don't get the
> -ESHUTDOWN status back and with this patch in place things continue
> after a -EXDEV status is received.
>
Actually, minor correction here: a recent change
b44c0e7fef51 ("usb: dwc3: gadget: conditionally remove requests")
changed -ESHUTDOWN request status to -ECONNRESET when disable endpoint.
This doesn't look right.
While disabling endpoint may also apply for other cases such as
switching alternate interface in addition to disconnect, -ESHUTDOWN
seems more fitting there.
Hi Michael,
Can you help clarify for the change above? This changed the usage of
requests. Now requests returned by disconnection won't be returned as
-ESHUTDOWN.
> >
> > There's a special case which dwc3 may give back requests early is the
> > case of the device disconnecting. The requests should be returned with
> > -ESHUTDOWN, and the gadget driver shouldn't be re-using the requests on
> > de-initialization anyway.
> >
> > We should not issue End Transfer command just because of missed isoc. We
> > may want issue End Transfer if the gadget driver is too slow and unable
> > to feed requests in time (causing underrun and missed isoc) to resync
> > with the host, but we already handle that.
>
> Hmm, isn't that what happens when we get into this
> condition in dwc3_gadget_endpoint_trbs_complete():
>
> if (usb_endpoint_xfer_isoc(dep->endpoint.desc) &&
> list_empty(&dep->started_list) &&
> (list_empty(&dep->pending_list) || status == -EXDEV))
> dwc3_stop_active_transfer(dep, true, true);
>
Yes, it's being handled there.
> >
> > I'm still not clear what's the problem you're seeing. Do you have the
> > crash log? Tracepoints?
> >
>
> Appreciate the support!
>
Thanks,
Thinh
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v3 2/6] usb: dwc3: gadget: cancel requests instead of release after missed isoc
2022-10-18 18:45 ` Thinh Nguyen
@ 2022-10-18 19:13 ` Michael Grzeschik
2022-10-18 22:45 ` Thinh Nguyen
0 siblings, 1 reply; 19+ messages in thread
From: Michael Grzeschik @ 2022-10-18 19:13 UTC (permalink / raw)
To: Thinh Nguyen
Cc: Dan Vacura, linux-usb@vger.kernel.org, Daniel Scally,
Jeff Vanhoof, stable@vger.kernel.org, Greg Kroah-Hartman,
Jonathan Corbet, Laurent Pinchart, Felipe Balbi, Paul Elder,
linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org
[-- Attachment #1: Type: text/plain, Size: 5591 bytes --]
Hi Thinh,
On Tue, Oct 18, 2022 at 06:45:40PM +0000, Thinh Nguyen wrote:
>On Mon, Oct 17, 2022, Dan Vacura wrote:
>> On Mon, Oct 17, 2022 at 09:30:38PM +0000, Thinh Nguyen wrote:
>> > On Mon, Oct 17, 2022, Dan Vacura wrote:
>> > > From: Jeff Vanhoof <qjv001@motorola.com>
>> > >
>> > > arm-smmu related crashes seen after a Missed ISOC interrupt when
>> > > no_interrupt=1 is used. This can happen if the hardware is still using
>> > > the data associated with a TRB after the usb_request's ->complete call
>> > > has been made. Instead of immediately releasing a request when a Missed
>> > > ISOC interrupt has occurred, this change will add logic to cancel the
>> > > request instead where it will eventually be released when the
>> > > END_TRANSFER command has completed. This logic is similar to some of the
>> > > cleanup done in dwc3_gadget_ep_dequeue.
>> >
>> > This doesn't sound right. How did you determine that the hardware is
>> > still using the data associated with the TRB? Did you check the TRB's
>> > HWO bit?
>>
>> The problem we're seeing was mentioned in the summary of this patch
>> series, issue #1. Basically, with the following patch
>> https://urldefense.com/v3/__https://patchwork.kernel.org/project/linux-usb/patch/20210628155311.16762-6-m.grzeschik@pengutronix.de/__;!!A4F2R9G_pg!aSNZ-IjMcPgL47A4NR5qp9qhVlP91UGTuCxej5NRTv8-FmTrMkKK7CjNToQQVEgtpqbKzLU2HXET9O226AEN$
>> integrated a smmu panic is occurring on our Android device with the 5.15
>> kernel which is:
>>
>> <3>[ 718.314900][ T803] arm-smmu 15000000.apps-smmu: Unhandled arm-smmu context fault from a600000.dwc3!
>>
>> The uvc gadget driver appears to be the first (and only) gadget that
>> uses the no_interrupt=1 logic, so this seems to be a new condition for
>> the dwc3 driver. In our configuration, we have up to 64 requests and the
>> no_interrupt=1 for up to 15 requests. The list size of dep->started_list
>> would get up to that amount when looping through to cleanup the
>> completed requests. From testing and debugging the smmu panic occurs
>> when a -EXDEV status shows up and right after
>> dwc3_gadget_ep_cleanup_completed_request() was visited. The conclusion
>> we had was the requests were getting returned to the gadget too early.
>
>As I mentioned, if the status is updated to missed isoc, that means that
>the controller returned ownership of the TRB to the driver. At least for
>the particular request with -EXDEV, its TRBs are completed. I'm not
>clear on your conclusion.
>
>Do we know where did the crash occur? Is it from dwc3 driver or from uvc
>driver, and at what line? It'd great if we can see the driver log.
>
>>
>> >
>> > The dwc3 driver would only give back the requests if the TRBs of the
>> > associated requests are completed or when the device is disconnected.
>> > If the TRB indicated missed isoc, that means that the TRB is completed
>> > and its status was updated.
>>
>> Interesting, the device is not disconnected as we don't get the
>> -ESHUTDOWN status back and with this patch in place things continue
>> after a -EXDEV status is received.
>>
>
>Actually, minor correction here: a recent change
>b44c0e7fef51 ("usb: dwc3: gadget: conditionally remove requests")
>changed -ESHUTDOWN request status to -ECONNRESET when disable endpoint.
>This doesn't look right.
>
>While disabling endpoint may also apply for other cases such as
>switching alternate interface in addition to disconnect, -ESHUTDOWN
>seems more fitting there.
>
>Hi Michael,
>
>Can you help clarify for the change above? This changed the usage of
>requests. Now requests returned by disconnection won't be returned as
>-ESHUTDOWN.
When writing the patch, I was looking into
Documentation/driver-api/usb/error-codes.rst.
After looking into it today, I see that ESHUTDOWN should be send on
ep_disable (device disable) and ECONNRESET on stop_active_transfer.
So I probably just mixed them up, while writing the patch. :/
The followup patch would then just be to swap the status results of
__dwc3_gadget_ep_disable and dwc3_stop_active_transfers on the
dwc3_remove_requests call.
Michael
>> >
>> > There's a special case which dwc3 may give back requests early is the
>> > case of the device disconnecting. The requests should be returned with
>> > -ESHUTDOWN, and the gadget driver shouldn't be re-using the requests on
>> > de-initialization anyway.
>> >
>> > We should not issue End Transfer command just because of missed isoc. We
>> > may want issue End Transfer if the gadget driver is too slow and unable
>> > to feed requests in time (causing underrun and missed isoc) to resync
>> > with the host, but we already handle that.
>>
>> Hmm, isn't that what happens when we get into this
>> condition in dwc3_gadget_endpoint_trbs_complete():
>>
>> if (usb_endpoint_xfer_isoc(dep->endpoint.desc) &&
>> list_empty(&dep->started_list) &&
>> (list_empty(&dep->pending_list) || status == -EXDEV))
>> dwc3_stop_active_transfer(dep, true, true);
>>
>
>Yes, it's being handled there.
>
>> >
>> > I'm still not clear what's the problem you're seeing. Do you have the
>> > crash log? Tracepoints?
>> >
>>
>> Appreciate the support!
>>
>
>Thanks,
>Thinh
--
Pengutronix e.K. | |
Steuerwalder Str. 21 | http://www.pengutronix.de/ |
31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v3 2/6] usb: dwc3: gadget: cancel requests instead of release after missed isoc
2022-10-18 19:13 ` Michael Grzeschik
@ 2022-10-18 22:45 ` Thinh Nguyen
2022-10-19 6:46 ` Michael Grzeschik
0 siblings, 1 reply; 19+ messages in thread
From: Thinh Nguyen @ 2022-10-18 22:45 UTC (permalink / raw)
To: Michael Grzeschik
Cc: Thinh Nguyen, Dan Vacura, linux-usb@vger.kernel.org,
Daniel Scally, Jeff Vanhoof, stable@vger.kernel.org,
Greg Kroah-Hartman, Jonathan Corbet, Laurent Pinchart,
Felipe Balbi, Paul Elder, linux-kernel@vger.kernel.org,
linux-doc@vger.kernel.org
On Tue, Oct 18, 2022, Michael Grzeschik wrote:
> Hi Thinh,
>
> On Tue, Oct 18, 2022 at 06:45:40PM +0000, Thinh Nguyen wrote:
> > On Mon, Oct 17, 2022, Dan Vacura wrote:
> > > On Mon, Oct 17, 2022 at 09:30:38PM +0000, Thinh Nguyen wrote:
> > > > On Mon, Oct 17, 2022, Dan Vacura wrote:
> > > > > From: Jeff Vanhoof <qjv001@motorola.com>
> > > > >
> > > > > arm-smmu related crashes seen after a Missed ISOC interrupt when
> > > > > no_interrupt=1 is used. This can happen if the hardware is still using
> > > > > the data associated with a TRB after the usb_request's ->complete call
> > > > > has been made. Instead of immediately releasing a request when a Missed
> > > > > ISOC interrupt has occurred, this change will add logic to cancel the
> > > > > request instead where it will eventually be released when the
> > > > > END_TRANSFER command has completed. This logic is similar to some of the
> > > > > cleanup done in dwc3_gadget_ep_dequeue.
> > > >
> > > > This doesn't sound right. How did you determine that the hardware is
> > > > still using the data associated with the TRB? Did you check the TRB's
> > > > HWO bit?
> > >
> > > The problem we're seeing was mentioned in the summary of this patch
> > > series, issue #1. Basically, with the following patch
> > > https://urldefense.com/v3/__https://patchwork.kernel.org/project/linux-usb/patch/20210628155311.16762-6-m.grzeschik@pengutronix.de/__;!!A4F2R9G_pg!aSNZ-IjMcPgL47A4NR5qp9qhVlP91UGTuCxej5NRTv8-FmTrMkKK7CjNToQQVEgtpqbKzLU2HXET9O226AEN$
> > > integrated a smmu panic is occurring on our Android device with the 5.15
> > > kernel which is:
> > >
> > > <3>[ 718.314900][ T803] arm-smmu 15000000.apps-smmu: Unhandled arm-smmu context fault from a600000.dwc3!
> > >
> > > The uvc gadget driver appears to be the first (and only) gadget that
> > > uses the no_interrupt=1 logic, so this seems to be a new condition for
> > > the dwc3 driver. In our configuration, we have up to 64 requests and the
> > > no_interrupt=1 for up to 15 requests. The list size of dep->started_list
> > > would get up to that amount when looping through to cleanup the
> > > completed requests. From testing and debugging the smmu panic occurs
> > > when a -EXDEV status shows up and right after
> > > dwc3_gadget_ep_cleanup_completed_request() was visited. The conclusion
> > > we had was the requests were getting returned to the gadget too early.
> >
> > As I mentioned, if the status is updated to missed isoc, that means that
> > the controller returned ownership of the TRB to the driver. At least for
> > the particular request with -EXDEV, its TRBs are completed. I'm not
> > clear on your conclusion.
> >
> > Do we know where did the crash occur? Is it from dwc3 driver or from uvc
> > driver, and at what line? It'd great if we can see the driver log.
> >
> > >
> > > >
> > > > The dwc3 driver would only give back the requests if the TRBs of the
> > > > associated requests are completed or when the device is disconnected.
> > > > If the TRB indicated missed isoc, that means that the TRB is completed
> > > > and its status was updated.
> > >
> > > Interesting, the device is not disconnected as we don't get the
> > > -ESHUTDOWN status back and with this patch in place things continue
> > > after a -EXDEV status is received.
> > >
> >
> > Actually, minor correction here: a recent change
> > b44c0e7fef51 ("usb: dwc3: gadget: conditionally remove requests")
> > changed -ESHUTDOWN request status to -ECONNRESET when disable endpoint.
> > This doesn't look right.
> >
> > While disabling endpoint may also apply for other cases such as
> > switching alternate interface in addition to disconnect, -ESHUTDOWN
> > seems more fitting there.
> >
> > Hi Michael,
> >
> > Can you help clarify for the change above? This changed the usage of
> > requests. Now requests returned by disconnection won't be returned as
> > -ESHUTDOWN.
>
> When writing the patch, I was looking into
> Documentation/driver-api/usb/error-codes.rst.
>
> After looking into it today, I see that ESHUTDOWN should be send on
> ep_disable (device disable) and ECONNRESET on stop_active_transfer.
> So I probably just mixed them up, while writing the patch. :/
>
I think you mean ECONNRESET for ep_dequeue()?
dwc3_stop_active_transfer() is called for both scenarios.
> The followup patch would then just be to swap the status results of
> __dwc3_gadget_ep_disable and dwc3_stop_active_transfers on the
> dwc3_remove_requests call.
>
> Michael
Can you help make a fix?
Thanks!
Thinh
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v3 2/6] usb: dwc3: gadget: cancel requests instead of release after missed isoc
2022-10-18 22:45 ` Thinh Nguyen
@ 2022-10-19 6:46 ` Michael Grzeschik
0 siblings, 0 replies; 19+ messages in thread
From: Michael Grzeschik @ 2022-10-19 6:46 UTC (permalink / raw)
To: Thinh Nguyen
Cc: Dan Vacura, linux-usb@vger.kernel.org, Daniel Scally,
Jeff Vanhoof, stable@vger.kernel.org, Greg Kroah-Hartman,
Jonathan Corbet, Laurent Pinchart, Felipe Balbi, Paul Elder,
linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org
[-- Attachment #1: Type: text/plain, Size: 5210 bytes --]
Hi Thinh,
On Tue, Oct 18, 2022 at 10:45:16PM +0000, Thinh Nguyen wrote:
>On Tue, Oct 18, 2022, Michael Grzeschik wrote:
>> On Tue, Oct 18, 2022 at 06:45:40PM +0000, Thinh Nguyen wrote:
>> > On Mon, Oct 17, 2022, Dan Vacura wrote:
>> > > On Mon, Oct 17, 2022 at 09:30:38PM +0000, Thinh Nguyen wrote:
>> > > > On Mon, Oct 17, 2022, Dan Vacura wrote:
>> > > > > From: Jeff Vanhoof <qjv001@motorola.com>
>> > > > >
>> > > > > arm-smmu related crashes seen after a Missed ISOC interrupt when
>> > > > > no_interrupt=1 is used. This can happen if the hardware is still using
>> > > > > the data associated with a TRB after the usb_request's ->complete call
>> > > > > has been made. Instead of immediately releasing a request when a Missed
>> > > > > ISOC interrupt has occurred, this change will add logic to cancel the
>> > > > > request instead where it will eventually be released when the
>> > > > > END_TRANSFER command has completed. This logic is similar to some of the
>> > > > > cleanup done in dwc3_gadget_ep_dequeue.
>> > > >
>> > > > This doesn't sound right. How did you determine that the hardware is
>> > > > still using the data associated with the TRB? Did you check the TRB's
>> > > > HWO bit?
>> > >
>> > > The problem we're seeing was mentioned in the summary of this patch
>> > > series, issue #1. Basically, with the following patch
>> > > https://urldefense.com/v3/__https://patchwork.kernel.org/project/linux-usb/patch/20210628155311.16762-6-m.grzeschik@pengutronix.de/__;!!A4F2R9G_pg!aSNZ-IjMcPgL47A4NR5qp9qhVlP91UGTuCxej5NRTv8-FmTrMkKK7CjNToQQVEgtpqbKzLU2HXET9O226AEN$
>> > > integrated a smmu panic is occurring on our Android device with the 5.15
>> > > kernel which is:
>> > >
>> > > <3>[ 718.314900][ T803] arm-smmu 15000000.apps-smmu: Unhandled arm-smmu context fault from a600000.dwc3!
>> > >
>> > > The uvc gadget driver appears to be the first (and only) gadget that
>> > > uses the no_interrupt=1 logic, so this seems to be a new condition for
>> > > the dwc3 driver. In our configuration, we have up to 64 requests and the
>> > > no_interrupt=1 for up to 15 requests. The list size of dep->started_list
>> > > would get up to that amount when looping through to cleanup the
>> > > completed requests. From testing and debugging the smmu panic occurs
>> > > when a -EXDEV status shows up and right after
>> > > dwc3_gadget_ep_cleanup_completed_request() was visited. The conclusion
>> > > we had was the requests were getting returned to the gadget too early.
>> >
>> > As I mentioned, if the status is updated to missed isoc, that means that
>> > the controller returned ownership of the TRB to the driver. At least for
>> > the particular request with -EXDEV, its TRBs are completed. I'm not
>> > clear on your conclusion.
>> >
>> > Do we know where did the crash occur? Is it from dwc3 driver or from uvc
>> > driver, and at what line? It'd great if we can see the driver log.
>> >
>> > >
>> > > >
>> > > > The dwc3 driver would only give back the requests if the TRBs of the
>> > > > associated requests are completed or when the device is disconnected.
>> > > > If the TRB indicated missed isoc, that means that the TRB is completed
>> > > > and its status was updated.
>> > >
>> > > Interesting, the device is not disconnected as we don't get the
>> > > -ESHUTDOWN status back and with this patch in place things continue
>> > > after a -EXDEV status is received.
>> > >
>> >
>> > Actually, minor correction here: a recent change
>> > b44c0e7fef51 ("usb: dwc3: gadget: conditionally remove requests")
>> > changed -ESHUTDOWN request status to -ECONNRESET when disable endpoint.
>> > This doesn't look right.
>> >
>> > While disabling endpoint may also apply for other cases such as
>> > switching alternate interface in addition to disconnect, -ESHUTDOWN
>> > seems more fitting there.
>> >
>> > Hi Michael,
>> >
>> > Can you help clarify for the change above? This changed the usage of
>> > requests. Now requests returned by disconnection won't be returned as
>> > -ESHUTDOWN.
>>
>> When writing the patch, I was looking into
>> Documentation/driver-api/usb/error-codes.rst.
>>
>> After looking into it today, I see that ESHUTDOWN should be send on
>> ep_disable (device disable) and ECONNRESET on stop_active_transfer.
>> So I probably just mixed them up, while writing the patch. :/
>>
>
>I think you mean ECONNRESET for ep_dequeue()?
>dwc3_stop_active_transfer() is called for both scenarios.
No, I meant dwc3_stop_active_transfer*s*.
On ep_dequeue the request status is already ECONNRESET.
>> The followup patch would then just be to swap the status results of
>> __dwc3_gadget_ep_disable and dwc3_stop_active_transfers on the
>> dwc3_remove_requests call.
>>
>> Michael
>
>Can you help make a fix?
Sure, I will write a patch.
Thanks,
Michael
--
Pengutronix e.K. | |
Steuerwalder Str. 21 | http://www.pengutronix.de/ |
31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v3 2/6] usb: dwc3: gadget: cancel requests instead of release after missed isoc
2022-10-17 20:54 ` [PATCH v3 2/6] usb: dwc3: gadget: cancel requests instead of release " Dan Vacura
2022-10-17 21:30 ` Thinh Nguyen
@ 2024-02-22 0:02 ` Michael Grzeschik
2024-02-22 1:20 ` Thinh Nguyen
1 sibling, 1 reply; 19+ messages in thread
From: Michael Grzeschik @ 2024-02-22 0:02 UTC (permalink / raw)
To: Dan Vacura, Thinh Nguyen
Cc: linux-usb, Daniel Scally, Thinh Nguyen, Jeff Vanhoof, stable,
Greg Kroah-Hartman, Jonathan Corbet, Laurent Pinchart,
Felipe Balbi, Paul Elder, linux-kernel, linux-doc
[-- Attachment #1: Type: text/plain, Size: 8627 bytes --]
Sorry for digging up this grave! :)
I once more came accross the whole situation we are still encountering
since one year or so again and found the some reasons why:
#1 there are so many latencies, so that the system is not fast enough to
enqueue requests back into an running HW-Transfer. At least on our
system setup.
and
#2 there are so many missed transfers leading to broken frames
when adding request with no_interrupt set.
For #1: There sometimes are situations in the system where the threaded
interrupt handler for the dwc3 is not called fast enough, although the
HW-irq was called early and enqueued the irq event and woke the irq
thread early. In our case this often happens, when there are other tasks
involved on the same CPU and the scheduler is not able to pipeline the
irq thread in the necessary time. In our case the main issue is an
HW-irq handler of the ethernet controller (cadence macb) that runs
berserk on CPU0 and therefor is taking a lot of CPU time. Per default on
our system all irq handlers are running on the same CPU. As per
definition all interrupt threads will be started on the same CPU as the
irq was called, this forces a lot of pressure on one Core. So changing
the smp_affinity of the dwc3 irq to the second CPU only, already solves
a lot of the underruns.
For #2: I found an issue in the handling of the completion of requests in
the started list. When the interrupt handler is *explicitly* calling
stop_active_transfer if the overall event of the request was an missed
event. This event value only represents the value of the request that
was actually triggering the interrupt.
It also calls ep_cleanup_completed_requests and is iterating over the
started requests and will call giveback/complete functions of the
requests with the proper request status.
So this will also catch missed requests in the queue. However, since
there might be, lets say 5 good requests and one missed request, what
will happen is, that each complete call for the first good requests will
enqueue new requests into the started list and will also call the
updatecmd on that transfer that was already missed until the loop will
reach the one request with the MISSED status bit set.
So in my opinion the patch from Jeff makes sense when adding the
following change aswell. With those both changes the underruns and
broken frames finally disappear. I am still unsure about the complete
solution about that, since with this the mentioned 5 good requests
will be cancelled aswell. So this is still a WIP status here.
diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index e031813c5769b..b991d25bbf897 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -3509,6 +3509,45 @@ static int dwc3_gadget_ep_cleanup_completed_request(struct dwc3_ep *dep,
return ret;
}
+static int dwc3_gadget_ep_check_missed_requests(struct dwc3_ep *dep)
+{
+ struct dwc3_request *req;
+ struct dwc3_request *tmp;
+ int ret = 0;
+
+ list_for_each_entry_safe(req, tmp, &dep->started_list, list) {
+ struct dwc3_trb *trb;
+
+ /* TOOD: check if the trb association is correct */
+ trb = req->trb;
+ switch (DWC3_TRB_SIZE_TRBSTS(trb->size)) {
+ case DWC3_TRBSTS_MISSED_ISOC:
+ /* Isoc endpoint only */
+ ret = -EXDEV;
+ break;
+ case DWC3_TRB_STS_XFER_IN_PROG:
+ /* Applicable when End Transfer with ForceRM=0 */
+ case DWC3_TRBSTS_SETUP_PENDING:
+ /* Control endpoint only */
+ case DWC3_TRBSTS_OK:
+ default:
+ ret = 0;
+ break;
+ }
+ }
+
+ return ret;
+}
+
static void dwc3_gadget_ep_cleanup_completed_requests(struct dwc3_ep *dep,
const struct dwc3_event_depevt *event, int status)
{
@@ -3566,7 +3605,7 @@ static bool dwc3_gadget_endpoint_trbs_complete(struct dwc3_ep *dep,
struct dwc3 *dwc = dep->dwc;
bool no_started_trb = true;
- if (status == -EXDEV) {
+ if (status == -EXDEV || dwc3_gadget_ep_check_missed_requests(dep)) {
struct dwc3_request *tmp;
struct dwc3_request *req;
On Mon, Oct 17, 2022 at 03:54:40PM -0500, Dan Vacura wrote:
>From: Jeff Vanhoof <qjv001@motorola.com>
>
>arm-smmu related crashes seen after a Missed ISOC interrupt when
>no_interrupt=1 is used. This can happen if the hardware is still using
>the data associated with a TRB after the usb_request's ->complete call
>has been made. Instead of immediately releasing a request when a Missed
>ISOC interrupt has occurred, this change will add logic to cancel the
>request instead where it will eventually be released when the
>END_TRANSFER command has completed. This logic is similar to some of the
>cleanup done in dwc3_gadget_ep_dequeue.
>
>Fixes: 6d8a019614f3 ("usb: dwc3: gadget: check for Missed Isoc from event status")
>Cc: <stable@vger.kernel.org>
>Signed-off-by: Jeff Vanhoof <qjv001@motorola.com>
>Co-developed-by: Dan Vacura <w36195@motorola.com>
>Signed-off-by: Dan Vacura <w36195@motorola.com>
>---
>V1 -> V3:
>- no change, new patch in series
>
> drivers/usb/dwc3/core.h | 1 +
> drivers/usb/dwc3/gadget.c | 38 ++++++++++++++++++++++++++------------
> 2 files changed, 27 insertions(+), 12 deletions(-)
>
>diff --git a/drivers/usb/dwc3/core.h b/drivers/usb/dwc3/core.h
>index 8f9959ba9fd4..9b005d912241 100644
>--- a/drivers/usb/dwc3/core.h
>+++ b/drivers/usb/dwc3/core.h
>@@ -943,6 +943,7 @@ struct dwc3_request {
> #define DWC3_REQUEST_STATUS_DEQUEUED 3
> #define DWC3_REQUEST_STATUS_STALLED 4
> #define DWC3_REQUEST_STATUS_COMPLETED 5
>+#define DWC3_REQUEST_STATUS_MISSED_ISOC 6
> #define DWC3_REQUEST_STATUS_UNKNOWN -1
>
> u8 epnum;
>diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
>index 079cd333632e..411532c5c378 100644
>--- a/drivers/usb/dwc3/gadget.c
>+++ b/drivers/usb/dwc3/gadget.c
>@@ -2021,6 +2021,9 @@ static void dwc3_gadget_ep_cleanup_cancelled_requests(struct dwc3_ep *dep)
> case DWC3_REQUEST_STATUS_STALLED:
> dwc3_gadget_giveback(dep, req, -EPIPE);
> break;
>+ case DWC3_REQUEST_STATUS_MISSED_ISOC:
>+ dwc3_gadget_giveback(dep, req, -EXDEV);
>+ break;
> default:
> dev_err(dwc->dev, "request cancelled with wrong reason:%d\n", req->status);
> dwc3_gadget_giveback(dep, req, -ECONNRESET);
>@@ -3402,21 +3405,32 @@ static bool dwc3_gadget_endpoint_trbs_complete(struct dwc3_ep *dep,
> struct dwc3 *dwc = dep->dwc;
> bool no_started_trb = true;
>
>- dwc3_gadget_ep_cleanup_completed_requests(dep, event, status);
>+ if (status == -EXDEV) {
>+ struct dwc3_request *tmp;
>+ struct dwc3_request *req;
>
>- if (dep->flags & DWC3_EP_END_TRANSFER_PENDING)
>- goto out;
>+ if (!(dep->flags & DWC3_EP_END_TRANSFER_PENDING))
>+ dwc3_stop_active_transfer(dep, true, true);
>
>- if (!dep->endpoint.desc)
>- return no_started_trb;
>+ list_for_each_entry_safe(req, tmp, &dep->started_list, list)
>+ dwc3_gadget_move_cancelled_request(req,
>+ DWC3_REQUEST_STATUS_MISSED_ISOC);
>+ } else {
>+ dwc3_gadget_ep_cleanup_completed_requests(dep, event, status);
>
>- if (usb_endpoint_xfer_isoc(dep->endpoint.desc) &&
>- list_empty(&dep->started_list) &&
>- (list_empty(&dep->pending_list) || status == -EXDEV))
>- dwc3_stop_active_transfer(dep, true, true);
>- else if (dwc3_gadget_ep_should_continue(dep))
>- if (__dwc3_gadget_kick_transfer(dep) == 0)
>- no_started_trb = false;
>+ if (dep->flags & DWC3_EP_END_TRANSFER_PENDING)
>+ goto out;
>+
>+ if (!dep->endpoint.desc)
>+ return no_started_trb;
>+
>+ if (usb_endpoint_xfer_isoc(dep->endpoint.desc) &&
>+ list_empty(&dep->started_list) && list_empty(&dep->pending_list))
>+ dwc3_stop_active_transfer(dep, true, true);
>+ else if (dwc3_gadget_ep_should_continue(dep))
>+ if (__dwc3_gadget_kick_transfer(dep) == 0)
>+ no_started_trb = false;
>+ }
>
> out:
> /*
>--
>2.34.1
>
--
Pengutronix e.K. | |
Steuerwalder Str. 21 | http://www.pengutronix.de/ |
31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH v3 2/6] usb: dwc3: gadget: cancel requests instead of release after missed isoc
2024-02-22 0:02 ` Michael Grzeschik
@ 2024-02-22 1:20 ` Thinh Nguyen
2024-02-27 21:01 ` Michael Grzeschik
0 siblings, 1 reply; 19+ messages in thread
From: Thinh Nguyen @ 2024-02-22 1:20 UTC (permalink / raw)
To: Michael Grzeschik
Cc: Dan Vacura, Thinh Nguyen, linux-usb@vger.kernel.org,
Daniel Scally, Jeff Vanhoof, stable@vger.kernel.org,
Greg Kroah-Hartman, Jonathan Corbet, Laurent Pinchart,
Felipe Balbi, Paul Elder, linux-kernel@vger.kernel.org,
linux-doc@vger.kernel.org
On Thu, Feb 22, 2024, Michael Grzeschik wrote:
> Sorry for digging up this grave! :)
>
> I once more came accross the whole situation we are still encountering
> since one year or so again and found the some reasons why:
>
> #1 there are so many latencies, so that the system is not fast enough to
> enqueue requests back into an running HW-Transfer. At least on our
> system setup.
>
> and
>
> #2 there are so many missed transfers leading to broken frames
> when adding request with no_interrupt set.
>
> For #1: There sometimes are situations in the system where the threaded
> interrupt handler for the dwc3 is not called fast enough, although the
> HW-irq was called early and enqueued the irq event and woke the irq
> thread early. In our case this often happens, when there are other tasks
> involved on the same CPU and the scheduler is not able to pipeline the
> irq thread in the necessary time. In our case the main issue is an
> HW-irq handler of the ethernet controller (cadence macb) that runs
> berserk on CPU0 and therefor is taking a lot of CPU time. Per default on
> our system all irq handlers are running on the same CPU. As per
> definition all interrupt threads will be started on the same CPU as the
> irq was called, this forces a lot of pressure on one Core. So changing
> the smp_affinity of the dwc3 irq to the second CPU only, already solves
> a lot of the underruns.
That's great!
>
> For #2: I found an issue in the handling of the completion of requests in
> the started list. When the interrupt handler is *explicitly* calling
> stop_active_transfer if the overall event of the request was an missed
> event. This event value only represents the value of the request that
> was actually triggering the interrupt.
>
> It also calls ep_cleanup_completed_requests and is iterating over the
> started requests and will call giveback/complete functions of the
> requests with the proper request status.
>
> So this will also catch missed requests in the queue. However, since
> there might be, lets say 5 good requests and one missed request, what
> will happen is, that each complete call for the first good requests will
> enqueue new requests into the started list and will also call the
> updatecmd on that transfer that was already missed until the loop will
> reach the one request with the MISSED status bit set.
>
> So in my opinion the patch from Jeff makes sense when adding the
> following change aswell. With those both changes the underruns and
> broken frames finally disappear. I am still unsure about the complete
> solution about that, since with this the mentioned 5 good requests
> will be cancelled aswell. So this is still a WIP status here.
>
When the dwc3 driver issues stop_active_transfer(), that means that the
started_list is empty and there is an underrun. It treats the incoming
requests as staled. However, for UVC, they are still "good".
I think you can just check if the started_list is empty before queuing
new requests. If it is, perform stop_active_transfer() to reschedule the
incoming requests. None of the newly queue requests will be released
yet since they are in the pending_list.
For UVC, perhaps you can introduce a new flag to usb_request called
"ignore_queue_latency" or something equivalent. The dwc3 is already
partially doing this for UVC. With this new flag, we can rework dwc3 to
clearly separate the expected behavior from the function driver.
BR,
Thinh
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v3 2/6] usb: dwc3: gadget: cancel requests instead of release after missed isoc
2024-02-22 1:20 ` Thinh Nguyen
@ 2024-02-27 21:01 ` Michael Grzeschik
2024-03-07 1:57 ` Thinh Nguyen
0 siblings, 1 reply; 19+ messages in thread
From: Michael Grzeschik @ 2024-02-27 21:01 UTC (permalink / raw)
To: Thinh Nguyen
Cc: Dan Vacura, linux-usb@vger.kernel.org, Daniel Scally,
Jeff Vanhoof, stable@vger.kernel.org, Greg Kroah-Hartman,
Jonathan Corbet, Laurent Pinchart, Felipe Balbi, Paul Elder,
linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org
[-- Attachment #1: Type: text/plain, Size: 8583 bytes --]
On Thu, Feb 22, 2024 at 01:20:04AM +0000, Thinh Nguyen wrote:
>On Thu, Feb 22, 2024, Michael Grzeschik wrote:
>> For #2: I found an issue in the handling of the completion of requests in
>> the started list. When the interrupt handler is *explicitly* calling
>> stop_active_transfer if the overall event of the request was an missed
>> event. This event value only represents the value of the request that
>> was actually triggering the interrupt.
>>
>> It also calls ep_cleanup_completed_requests and is iterating over the
>> started requests and will call giveback/complete functions of the
>> requests with the proper request status.
>>
>> So this will also catch missed requests in the queue. However, since
>> there might be, lets say 5 good requests and one missed request, what
>> will happen is, that each complete call for the first good requests will
>> enqueue new requests into the started list and will also call the
>> updatecmd on that transfer that was already missed until the loop will
>> reach the one request with the MISSED status bit set.
>>
>> So in my opinion the patch from Jeff makes sense when adding the
>> following change aswell. With those both changes the underruns and
>> broken frames finally disappear. I am still unsure about the complete
>> solution about that, since with this the mentioned 5 good requests
>> will be cancelled aswell. So this is still a WIP status here.
>>
>
>When the dwc3 driver issues stop_active_transfer(), that means that the
>started_list is empty and there is an underrun.
At this moment this is only the case when both, pending and started list
are empty. Or the interrupt event was EXDEV.
The main problem is that the function
dwc3_gadget_ep_cleanup_completed_requests(dep, event, status); will
issue an complete for each started request, which on the other hand will
refill the pending list, and therefor after that refill the
stop_active_transfer is currently never hit.
>It treats the incoming requests as staled. However, for UVC, they are
>still "good".
Right, so in that case we can requeue them anyway. But this will have to
be done after the stop transfer cmd has finished.
>I think you can just check if the started_list is empty before queuing
>new requests. If it is, perform stop_active_transfer() to reschedule the
>incoming requests. None of the newly queue requests will be released
>yet since they are in the pending_list.
So that is basically exactly what my patch is doing. However in the case
of an underrun it is not safe to call dwc3_gadget_ep_cleanup_completed_requests
as jeff stated. So his underlying patch is really fixing an issue here.
>For UVC, perhaps you can introduce a new flag to usb_request called
>"ignore_queue_latency" or something equivalent. The dwc3 is already
>partially doing this for UVC. With this new flag, we can rework dwc3 to
>clearly separate the expected behavior from the function driver.
I don't know why this "extra" flag is even necessary. The code example
is already working without that extra flag.
Actually I even came up with an better solution. Additionally of checking if
one of the requests in the started list was missed, we can activly check if
the trb ring did run dry and if dwc3_gadget_endpoint_trbs_complete is
going to enqueue in to the empty trb ring.
So my whole change looks like that:
diff --git a/drivers/usb/dwc3/core.h b/drivers/usb/dwc3/core.h
index efe6caf4d0e87..2c8047dcd1612 100644
--- a/drivers/usb/dwc3/core.h
+++ b/drivers/usb/dwc3/core.h
@@ -952,6 +952,7 @@ struct dwc3_request {
#define DWC3_REQUEST_STATUS_DEQUEUED 3
#define DWC3_REQUEST_STATUS_STALLED 4
#define DWC3_REQUEST_STATUS_COMPLETED 5
+#define DWC3_REQUEST_STATUS_MISSED_ISOC 6
#define DWC3_REQUEST_STATUS_UNKNOWN -1
u8 epnum;
diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index 858fe4c299b7a..a31f4d3502bd3 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -2057,6 +2057,9 @@ static void dwc3_gadget_ep_cleanup_cancelled_requests(struct dwc3_ep *dep)
req = next_request(&dep->cancelled_list);
dwc3_gadget_ep_skip_trbs(dep, req);
switch (req->status) {
+ case 0:
+ dwc3_gadget_giveback(dep, req, 0);
+ break;
case DWC3_REQUEST_STATUS_DISCONNECTED:
dwc3_gadget_giveback(dep, req, -ESHUTDOWN);
break;
@@ -2066,6 +2069,9 @@ static void dwc3_gadget_ep_cleanup_cancelled_requests(struct dwc3_ep *dep)
case DWC3_REQUEST_STATUS_STALLED:
dwc3_gadget_giveback(dep, req, -EPIPE);
break;
+ case DWC3_REQUEST_STATUS_MISSED_ISOC:
+ dwc3_gadget_giveback(dep, req, -EXDEV);
+ break;
default:
dev_err(dwc->dev, "request cancelled with wrong reason:%d\n", req->status);
dwc3_gadget_giveback(dep, req, -ECONNRESET);
@@ -3509,6 +3515,36 @@ static int dwc3_gadget_ep_cleanup_completed_request(struct dwc3_ep *dep,
return ret;
}
+static int dwc3_gadget_ep_check_missed_requests(struct dwc3_ep *dep)
+{
+ struct dwc3_request *req;
+ struct dwc3_request *tmp;
+ int ret = 0;
+
+ list_for_each_entry_safe(req, tmp, &dep->started_list, list) {
+ struct dwc3_trb *trb;
+
+ trb = req->trb;
+ switch (DWC3_TRB_SIZE_TRBSTS(trb->size)) {
+ case DWC3_TRBSTS_MISSED_ISOC:
+ /* Isoc endpoint only */
+ ret = -EXDEV;
+ break;
+ case DWC3_TRB_STS_XFER_IN_PROG:
+ /* Applicable when End Transfer with ForceRM=0 */
+ case DWC3_TRBSTS_SETUP_PENDING:
+ /* Control endpoint only */
+ case DWC3_TRBSTS_OK:
+ default:
+ ret = 0;
+ break;
+ }
+ }
+
+ return ret;
+}
+
static void dwc3_gadget_ep_cleanup_completed_requests(struct dwc3_ep *dep,
const struct dwc3_event_depevt *event, int status)
{
@@ -3565,22 +3601,51 @@ static bool dwc3_gadget_endpoint_trbs_complete(struct dwc3_ep *dep,
{
struct dwc3 *dwc = dep->dwc;
bool no_started_trb = true;
+ unsigned int transfer_in_flight = 0;
+
+ /* It is possible that the interrupt thread was delayed by
+ * scheduling in the system, and therefor the HW has already
+ * run dry. In that case the last trb in the queue is already
+ * handled by the hw. By checking the HWO bit we know to restart
+ * the whole transfer. The condition to appear is more likelely
+ * if not every trb has the IOC bit set and therefor does not
+ * trigger the interrupt thread fewer.
+ */
+ if (dep->number && usb_endpoint_xfer_isoc(dep->endpoint.desc)) {
+ struct dwc3_trb *trb;
- dwc3_gadget_ep_cleanup_completed_requests(dep, event, status);
+ trb = dwc3_ep_prev_trb(dep, dep->trb_enqueue);
+ transfer_in_flight = trb->ctrl & DWC3_TRB_CTRL_HWO;
+ }
- if (dep->flags & DWC3_EP_END_TRANSFER_PENDING)
- goto out;
+ if (status == -EXDEV || !transfer_in_flight) {
+ struct dwc3_request *tmp;
+ struct dwc3_request *req;
- if (!dep->endpoint.desc)
- return no_started_trb;
+ if (!(dep->flags & DWC3_EP_END_TRANSFER_PENDING))
+ dwc3_stop_active_transfer(dep, true, true);
- if (usb_endpoint_xfer_isoc(dep->endpoint.desc) &&
- list_empty(&dep->started_list) &&
- (list_empty(&dep->pending_list) || status == -EXDEV))
- dwc3_stop_active_transfer(dep, true, true);
- else if (dwc3_gadget_ep_should_continue(dep))
- if (__dwc3_gadget_kick_transfer(dep) == 0)
- no_started_trb = false;
+ list_for_each_entry_safe(req, tmp, &dep->started_list, list) {
+ dwc3_gadget_move_cancelled_request(req,
+ (DWC3_TRB_SIZE_TRBSTS(req->trb->size) == DWC3_TRBSTS_MISSED_ISOC) ?
+ DWC3_REQUEST_STATUS_MISSED_ISOC : 0);
+ }
+ } else {
+ dwc3_gadget_ep_cleanup_completed_requests(dep, event, status);
+
+ if (dep->flags & DWC3_EP_END_TRANSFER_PENDING)
+ goto out;
+
+ if (!dep->endpoint.desc)
+ return no_started_trb;
+
+ if (usb_endpoint_xfer_isoc(dep->endpoint.desc) &&
+ list_empty(&dep->started_list) && list_empty(&dep->pending_list))
+ dwc3_stop_active_transfer(dep, true, true);
+ else if (dwc3_gadget_ep_should_continue(dep))
+ if (__dwc3_gadget_kick_transfer(dep) == 0)
+ no_started_trb = false;
+ }
out:
/*
I will seperate the whole hunk into smaller changes and send an v1
the next days to review.
Regards,
Michael
--
Pengutronix e.K. | |
Steuerwalder Str. 21 | http://www.pengutronix.de/ |
31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH v3 2/6] usb: dwc3: gadget: cancel requests instead of release after missed isoc
2024-02-27 21:01 ` Michael Grzeschik
@ 2024-03-07 1:57 ` Thinh Nguyen
2024-03-07 16:15 ` Michael Grzeschik
0 siblings, 1 reply; 19+ messages in thread
From: Thinh Nguyen @ 2024-03-07 1:57 UTC (permalink / raw)
To: Michael Grzeschik
Cc: Thinh Nguyen, Dan Vacura, linux-usb@vger.kernel.org,
Daniel Scally, Jeff Vanhoof, stable@vger.kernel.org,
Greg Kroah-Hartman, Jonathan Corbet, Laurent Pinchart,
Felipe Balbi, Paul Elder, linux-kernel@vger.kernel.org,
linux-doc@vger.kernel.org
On Tue, Feb 27, 2024, Michael Grzeschik wrote:
> On Thu, Feb 22, 2024 at 01:20:04AM +0000, Thinh Nguyen wrote:
> > On Thu, Feb 22, 2024, Michael Grzeschik wrote:
> > > For #2: I found an issue in the handling of the completion of requests in
> > > the started list. When the interrupt handler is *explicitly* calling
> > > stop_active_transfer if the overall event of the request was an missed
> > > event. This event value only represents the value of the request that
> > > was actually triggering the interrupt.
> > >
> > > It also calls ep_cleanup_completed_requests and is iterating over the
> > > started requests and will call giveback/complete functions of the
> > > requests with the proper request status.
> > >
> > > So this will also catch missed requests in the queue. However, since
> > > there might be, lets say 5 good requests and one missed request, what
> > > will happen is, that each complete call for the first good requests will
> > > enqueue new requests into the started list and will also call the
> > > updatecmd on that transfer that was already missed until the loop will
> > > reach the one request with the MISSED status bit set.
> > >
> > > So in my opinion the patch from Jeff makes sense when adding the
> > > following change aswell. With those both changes the underruns and
> > > broken frames finally disappear. I am still unsure about the complete
> > > solution about that, since with this the mentioned 5 good requests
> > > will be cancelled aswell. So this is still a WIP status here.
> > >
> >
> > When the dwc3 driver issues stop_active_transfer(), that means that the
> > started_list is empty and there is an underrun.
>
> At this moment this is only the case when both, pending and started list
> are empty. Or the interrupt event was EXDEV.
>
> The main problem is that the function
> dwc3_gadget_ep_cleanup_completed_requests(dep, event, status); will
> issue an complete for each started request, which on the other hand will
> refill the pending list, and therefor after that refill the
> stop_active_transfer is currently never hit.
>
> > It treats the incoming requests as staled. However, for UVC, they are
> > still "good".
>
> Right, so in that case we can requeue them anyway. But this will have to
> be done after the stop transfer cmd has finished.
>
> > I think you can just check if the started_list is empty before queuing
> > new requests. If it is, perform stop_active_transfer() to reschedule the
> > incoming requests. None of the newly queue requests will be released
> > yet since they are in the pending_list.
>
> So that is basically exactly what my patch is doing. However in the case
> of an underrun it is not safe to call dwc3_gadget_ep_cleanup_completed_requests
> as jeff stated. So his underlying patch is really fixing an issue here.
What I mean is to actively check for started list on every
usb_ep_queue() call. Checking during
dwc3_gadget_ep_cleanup_completed_requests() is already too late.
>
> > For UVC, perhaps you can introduce a new flag to usb_request called
> > "ignore_queue_latency" or something equivalent. The dwc3 is already
> > partially doing this for UVC. With this new flag, we can rework dwc3 to
> > clearly separate the expected behavior from the function driver.
>
> I don't know why this "extra" flag is even necessary. The code example
> is already working without that extra flag.
The flag is for controller to determine what kinds of behavior the
function driver expects. My intention is if this extra flag is not set,
the dwc3 driver will not attempt to reshcedule isoc request at all (ie.
no stop_active_transfer()).
>
> Actually I even came up with an better solution. Additionally of checking if
> one of the requests in the started list was missed, we can activly check if
> the trb ring did run dry and if dwc3_gadget_endpoint_trbs_complete is
> going to enqueue in to the empty trb ring.
>
> So my whole change looks like that:
>
> diff --git a/drivers/usb/dwc3/core.h b/drivers/usb/dwc3/core.h
> index efe6caf4d0e87..2c8047dcd1612 100644
> --- a/drivers/usb/dwc3/core.h
> +++ b/drivers/usb/dwc3/core.h
> @@ -952,6 +952,7 @@ struct dwc3_request {
> #define DWC3_REQUEST_STATUS_DEQUEUED 3
> #define DWC3_REQUEST_STATUS_STALLED 4
> #define DWC3_REQUEST_STATUS_COMPLETED 5
> +#define DWC3_REQUEST_STATUS_MISSED_ISOC 6
> #define DWC3_REQUEST_STATUS_UNKNOWN -1
> u8 epnum;
> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
> index 858fe4c299b7a..a31f4d3502bd3 100644
> --- a/drivers/usb/dwc3/gadget.c
> +++ b/drivers/usb/dwc3/gadget.c
> @@ -2057,6 +2057,9 @@ static void dwc3_gadget_ep_cleanup_cancelled_requests(struct dwc3_ep *dep)
> req = next_request(&dep->cancelled_list);
> dwc3_gadget_ep_skip_trbs(dep, req);
> switch (req->status) {
> + case 0:
> + dwc3_gadget_giveback(dep, req, 0);
> + break;
> case DWC3_REQUEST_STATUS_DISCONNECTED:
> dwc3_gadget_giveback(dep, req, -ESHUTDOWN);
> break;
> @@ -2066,6 +2069,9 @@ static void dwc3_gadget_ep_cleanup_cancelled_requests(struct dwc3_ep *dep)
> case DWC3_REQUEST_STATUS_STALLED:
> dwc3_gadget_giveback(dep, req, -EPIPE);
> break;
> + case DWC3_REQUEST_STATUS_MISSED_ISOC:
> + dwc3_gadget_giveback(dep, req, -EXDEV);
> + break;
> default:
> dev_err(dwc->dev, "request cancelled with wrong reason:%d\n", req->status);
> dwc3_gadget_giveback(dep, req, -ECONNRESET);
> @@ -3509,6 +3515,36 @@ static int dwc3_gadget_ep_cleanup_completed_request(struct dwc3_ep *dep,
> return ret;
> }
> +static int dwc3_gadget_ep_check_missed_requests(struct dwc3_ep *dep)
> +{
> + struct dwc3_request *req;
> + struct dwc3_request *tmp;
> + int ret = 0;
> +
> + list_for_each_entry_safe(req, tmp, &dep->started_list, list) {
> + struct dwc3_trb *trb;
> +
> + trb = req->trb;
> + switch (DWC3_TRB_SIZE_TRBSTS(trb->size)) {
> + case DWC3_TRBSTS_MISSED_ISOC:
> + /* Isoc endpoint only */
> + ret = -EXDEV;
> + break;
> + case DWC3_TRB_STS_XFER_IN_PROG:
> + /* Applicable when End Transfer with ForceRM=0 */
> + case DWC3_TRBSTS_SETUP_PENDING:
> + /* Control endpoint only */
> + case DWC3_TRBSTS_OK:
> + default:
> + ret = 0;
> + break;
> + }
> + }
> +
> + return ret;
> +}
> +
> static void dwc3_gadget_ep_cleanup_completed_requests(struct dwc3_ep *dep,
> const struct dwc3_event_depevt *event, int status)
> {
> @@ -3565,22 +3601,51 @@ static bool dwc3_gadget_endpoint_trbs_complete(struct dwc3_ep *dep,
> {
> struct dwc3 *dwc = dep->dwc;
> bool no_started_trb = true;
> + unsigned int transfer_in_flight = 0;
> +
> + /* It is possible that the interrupt thread was delayed by
> + * scheduling in the system, and therefor the HW has already
> + * run dry. In that case the last trb in the queue is already
> + * handled by the hw. By checking the HWO bit we know to restart
> + * the whole transfer. The condition to appear is more likelely
> + * if not every trb has the IOC bit set and therefor does not
> + * trigger the interrupt thread fewer.
> + */
> + if (dep->number && usb_endpoint_xfer_isoc(dep->endpoint.desc)) {
> + struct dwc3_trb *trb;
> - dwc3_gadget_ep_cleanup_completed_requests(dep, event, status);
> + trb = dwc3_ep_prev_trb(dep, dep->trb_enqueue);
> + transfer_in_flight = trb->ctrl & DWC3_TRB_CTRL_HWO;
> + }
> - if (dep->flags & DWC3_EP_END_TRANSFER_PENDING)
> - goto out;
> + if (status == -EXDEV || !transfer_in_flight) {
> + struct dwc3_request *tmp;
> + struct dwc3_request *req;
> - if (!dep->endpoint.desc)
> - return no_started_trb;
> + if (!(dep->flags & DWC3_EP_END_TRANSFER_PENDING))
> + dwc3_stop_active_transfer(dep, true, true);
> - if (usb_endpoint_xfer_isoc(dep->endpoint.desc) &&
> - list_empty(&dep->started_list) &&
> - (list_empty(&dep->pending_list) || status == -EXDEV))
> - dwc3_stop_active_transfer(dep, true, true);
> - else if (dwc3_gadget_ep_should_continue(dep))
> - if (__dwc3_gadget_kick_transfer(dep) == 0)
> - no_started_trb = false;
> + list_for_each_entry_safe(req, tmp, &dep->started_list, list) {
> + dwc3_gadget_move_cancelled_request(req,
> + (DWC3_TRB_SIZE_TRBSTS(req->trb->size) == DWC3_TRBSTS_MISSED_ISOC) ?
> + DWC3_REQUEST_STATUS_MISSED_ISOC : 0);
> + }
> + } else {
> + dwc3_gadget_ep_cleanup_completed_requests(dep, event, status);
> +
> + if (dep->flags & DWC3_EP_END_TRANSFER_PENDING)
> + goto out;
> +
> + if (!dep->endpoint.desc)
> + return no_started_trb;
> +
> + if (usb_endpoint_xfer_isoc(dep->endpoint.desc) &&
> + list_empty(&dep->started_list) && list_empty(&dep->pending_list))
> + dwc3_stop_active_transfer(dep, true, true);
> + else if (dwc3_gadget_ep_should_continue(dep))
> + if (__dwc3_gadget_kick_transfer(dep) == 0)
> + no_started_trb = false;
> + }
> out:
> /*
>
> I will seperate the whole hunk into smaller changes and send an v1
> the next days to review.
>
No, we should not reschedule for every missed-isoc. We only want to
target underrun condition.
Thanks,
Thinh
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v3 2/6] usb: dwc3: gadget: cancel requests instead of release after missed isoc
2024-03-07 1:57 ` Thinh Nguyen
@ 2024-03-07 16:15 ` Michael Grzeschik
2024-03-08 2:47 ` Thinh Nguyen
0 siblings, 1 reply; 19+ messages in thread
From: Michael Grzeschik @ 2024-03-07 16:15 UTC (permalink / raw)
To: Thinh Nguyen
Cc: Dan Vacura, linux-usb@vger.kernel.org, Daniel Scally,
Jeff Vanhoof, stable@vger.kernel.org, Greg Kroah-Hartman,
Jonathan Corbet, Laurent Pinchart, Felipe Balbi, Paul Elder,
linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org
[-- Attachment #1: Type: text/plain, Size: 10235 bytes --]
On Thu, Mar 07, 2024 at 01:57:44AM +0000, Thinh Nguyen wrote:
>On Tue, Feb 27, 2024, Michael Grzeschik wrote:
>> On Thu, Feb 22, 2024 at 01:20:04AM +0000, Thinh Nguyen wrote:
>> > On Thu, Feb 22, 2024, Michael Grzeschik wrote:
>> > > For #2: I found an issue in the handling of the completion of requests in
>> > > the started list. When the interrupt handler is *explicitly* calling
>> > > stop_active_transfer if the overall event of the request was an missed
>> > > event. This event value only represents the value of the request that
>> > > was actually triggering the interrupt.
>> > >
>> > > It also calls ep_cleanup_completed_requests and is iterating over the
>> > > started requests and will call giveback/complete functions of the
>> > > requests with the proper request status.
>> > >
>> > > So this will also catch missed requests in the queue. However, since
>> > > there might be, lets say 5 good requests and one missed request, what
>> > > will happen is, that each complete call for the first good requests will
>> > > enqueue new requests into the started list and will also call the
>> > > updatecmd on that transfer that was already missed until the loop will
>> > > reach the one request with the MISSED status bit set.
>> > >
>> > > So in my opinion the patch from Jeff makes sense when adding the
>> > > following change aswell. With those both changes the underruns and
>> > > broken frames finally disappear. I am still unsure about the complete
>> > > solution about that, since with this the mentioned 5 good requests
>> > > will be cancelled aswell. So this is still a WIP status here.
>> > >
>> >
>> > When the dwc3 driver issues stop_active_transfer(), that means that the
>> > started_list is empty and there is an underrun.
>>
>> At this moment this is only the case when both, pending and started list
>> are empty. Or the interrupt event was EXDEV.
>>
>> The main problem is that the function
>> dwc3_gadget_ep_cleanup_completed_requests(dep, event, status); will
>> issue an complete for each started request, which on the other hand will
>> refill the pending list, and therefor after that refill the
>> stop_active_transfer is currently never hit.
>>
>> > It treats the incoming requests as staled. However, for UVC, they are
>> > still "good".
>>
>> Right, so in that case we can requeue them anyway. But this will have to
>> be done after the stop transfer cmd has finished.
>>
>> > I think you can just check if the started_list is empty before queuing
>> > new requests. If it is, perform stop_active_transfer() to reschedule the
>> > incoming requests. None of the newly queue requests will be released
>> > yet since they are in the pending_list.
>>
>> So that is basically exactly what my patch is doing. However in the case
>> of an underrun it is not safe to call dwc3_gadget_ep_cleanup_completed_requests
>> as jeff stated. So his underlying patch is really fixing an issue here.
>
>What I mean is to actively check for started list on every
>usb_ep_queue() call. Checking during
>dwc3_gadget_ep_cleanup_completed_requests() is already too late.
I see.
>>
>> > For UVC, perhaps you can introduce a new flag to usb_request called
>> > "ignore_queue_latency" or something equivalent. The dwc3 is already
>> > partially doing this for UVC. With this new flag, we can rework dwc3 to
>> > clearly separate the expected behavior from the function driver.
>>
>> I don't know why this "extra" flag is even necessary. The code example
>> is already working without that extra flag.
>
>The flag is for controller to determine what kinds of behavior the
>function driver expects. My intention is if this extra flag is not set,
>the dwc3 driver will not attempt to reshcedule isoc request at all (ie.
>no stop_active_transfer()).
Ok.
>>
>> Actually I even came up with an better solution. Additionally of checking if
>> one of the requests in the started list was missed, we can activly check if
>> the trb ring did run dry and if dwc3_gadget_endpoint_trbs_complete is
>> going to enqueue in to the empty trb ring.
>>
>> So my whole change looks like that:
>>
>> diff --git a/drivers/usb/dwc3/core.h b/drivers/usb/dwc3/core.h
>> index efe6caf4d0e87..2c8047dcd1612 100644
>> --- a/drivers/usb/dwc3/core.h
>> +++ b/drivers/usb/dwc3/core.h
>> @@ -952,6 +952,7 @@ struct dwc3_request {
>> #define DWC3_REQUEST_STATUS_DEQUEUED 3
>> #define DWC3_REQUEST_STATUS_STALLED 4
>> #define DWC3_REQUEST_STATUS_COMPLETED 5
>> +#define DWC3_REQUEST_STATUS_MISSED_ISOC 6
>> #define DWC3_REQUEST_STATUS_UNKNOWN -1
>> u8 epnum;
>> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
>> index 858fe4c299b7a..a31f4d3502bd3 100644
>> --- a/drivers/usb/dwc3/gadget.c
>> +++ b/drivers/usb/dwc3/gadget.c
>> @@ -2057,6 +2057,9 @@ static void dwc3_gadget_ep_cleanup_cancelled_requests(struct dwc3_ep *dep)
>> req = next_request(&dep->cancelled_list);
>> dwc3_gadget_ep_skip_trbs(dep, req);
>> switch (req->status) {
>> + case 0:
>> + dwc3_gadget_giveback(dep, req, 0);
>> + break;
>> case DWC3_REQUEST_STATUS_DISCONNECTED:
>> dwc3_gadget_giveback(dep, req, -ESHUTDOWN);
>> break;
>> @@ -2066,6 +2069,9 @@ static void dwc3_gadget_ep_cleanup_cancelled_requests(struct dwc3_ep *dep)
>> case DWC3_REQUEST_STATUS_STALLED:
>> dwc3_gadget_giveback(dep, req, -EPIPE);
>> break;
>> + case DWC3_REQUEST_STATUS_MISSED_ISOC:
>> + dwc3_gadget_giveback(dep, req, -EXDEV);
>> + break;
>> default:
>> dev_err(dwc->dev, "request cancelled with wrong reason:%d\n", req->status);
>> dwc3_gadget_giveback(dep, req, -ECONNRESET);
>> @@ -3509,6 +3515,36 @@ static int dwc3_gadget_ep_cleanup_completed_request(struct dwc3_ep *dep,
>> return ret;
>> }
>> +static int dwc3_gadget_ep_check_missed_requests(struct dwc3_ep *dep)
>> +{
>> + struct dwc3_request *req;
>> + struct dwc3_request *tmp;
>> + int ret = 0;
>> +
>> + list_for_each_entry_safe(req, tmp, &dep->started_list, list) {
>> + struct dwc3_trb *trb;
>> +
>> + trb = req->trb;
>> + switch (DWC3_TRB_SIZE_TRBSTS(trb->size)) {
>> + case DWC3_TRBSTS_MISSED_ISOC:
>> + /* Isoc endpoint only */
>> + ret = -EXDEV;
>> + break;
>> + case DWC3_TRB_STS_XFER_IN_PROG:
>> + /* Applicable when End Transfer with ForceRM=0 */
>> + case DWC3_TRBSTS_SETUP_PENDING:
>> + /* Control endpoint only */
>> + case DWC3_TRBSTS_OK:
>> + default:
>> + ret = 0;
>> + break;
>> + }
>> + }
>> +
>> + return ret;
>> +}
>> +
>> static void dwc3_gadget_ep_cleanup_completed_requests(struct dwc3_ep *dep,
>> const struct dwc3_event_depevt *event, int status)
>> {
>> @@ -3565,22 +3601,51 @@ static bool dwc3_gadget_endpoint_trbs_complete(struct dwc3_ep *dep,
>> {
>> struct dwc3 *dwc = dep->dwc;
>> bool no_started_trb = true;
>> + unsigned int transfer_in_flight = 0;
>> +
>> + /* It is possible that the interrupt thread was delayed by
>> + * scheduling in the system, and therefor the HW has already
>> + * run dry. In that case the last trb in the queue is already
>> + * handled by the hw. By checking the HWO bit we know to restart
>> + * the whole transfer. The condition to appear is more likelely
>> + * if not every trb has the IOC bit set and therefor does not
>> + * trigger the interrupt thread fewer.
>> + */
>> + if (dep->number && usb_endpoint_xfer_isoc(dep->endpoint.desc)) {
>> + struct dwc3_trb *trb;
>> - dwc3_gadget_ep_cleanup_completed_requests(dep, event, status);
>> + trb = dwc3_ep_prev_trb(dep, dep->trb_enqueue);
>> + transfer_in_flight = trb->ctrl & DWC3_TRB_CTRL_HWO;
>> + }
>> - if (dep->flags & DWC3_EP_END_TRANSFER_PENDING)
>> - goto out;
>> + if (status == -EXDEV || !transfer_in_flight) {
>> + struct dwc3_request *tmp;
>> + struct dwc3_request *req;
>> - if (!dep->endpoint.desc)
>> - return no_started_trb;
>> + if (!(dep->flags & DWC3_EP_END_TRANSFER_PENDING))
>> + dwc3_stop_active_transfer(dep, true, true);
>> - if (usb_endpoint_xfer_isoc(dep->endpoint.desc) &&
>> - list_empty(&dep->started_list) &&
>> - (list_empty(&dep->pending_list) || status == -EXDEV))
@[!!here!!]
>> - dwc3_stop_active_transfer(dep, true, true);
>> - else if (dwc3_gadget_ep_should_continue(dep))
>> - if (__dwc3_gadget_kick_transfer(dep) == 0)
>> - no_started_trb = false;
>> + list_for_each_entry_safe(req, tmp, &dep->started_list, list) {
>> + dwc3_gadget_move_cancelled_request(req,
>> + (DWC3_TRB_SIZE_TRBSTS(req->trb->size) == DWC3_TRBSTS_MISSED_ISOC) ?
>> + DWC3_REQUEST_STATUS_MISSED_ISOC : 0);
>> + }
>> + } else {
>> + dwc3_gadget_ep_cleanup_completed_requests(dep, event, status);
>> +
>> + if (dep->flags & DWC3_EP_END_TRANSFER_PENDING)
>> + goto out;
>> +
>> + if (!dep->endpoint.desc)
>> + return no_started_trb;
>> +
>> + if (usb_endpoint_xfer_isoc(dep->endpoint.desc) &&
>> + list_empty(&dep->started_list) && list_empty(&dep->pending_list))
>> + dwc3_stop_active_transfer(dep, true, true);
>> + else if (dwc3_gadget_ep_should_continue(dep))
>> + if (__dwc3_gadget_kick_transfer(dep) == 0)
>> + no_started_trb = false;
>> + }
>> out:
>> /*
>>
>> I will seperate the whole hunk into smaller changes and send an v1
>> the next days to review.
>>
I finally send a v1 of my series.
https://lore.kernel.org/linux-usb/20240307-dwc3-gadget-complete-irq-v1-0-4fe9ac0ba2b7@pengutronix.de/
For the rest of the discussion, I would like to move the conversation to
the newly send series.
>No, we should not reschedule for every missed-isoc. We only want to
>target underrun condition.
As you stated above, with reschedule what you mean is calling
stop_transfer after a missed transfer was seen?
If so, why is this condition in there already? (@[!!here!!])
Michael
--
Pengutronix e.K. | |
Steuerwalder Str. 21 | http://www.pengutronix.de/ |
31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v3 2/6] usb: dwc3: gadget: cancel requests instead of release after missed isoc
2024-03-07 16:15 ` Michael Grzeschik
@ 2024-03-08 2:47 ` Thinh Nguyen
0 siblings, 0 replies; 19+ messages in thread
From: Thinh Nguyen @ 2024-03-08 2:47 UTC (permalink / raw)
To: Michael Grzeschik
Cc: Thinh Nguyen, Dan Vacura, linux-usb@vger.kernel.org,
Daniel Scally, Jeff Vanhoof, stable@vger.kernel.org,
Greg Kroah-Hartman, Jonathan Corbet, Laurent Pinchart,
Felipe Balbi, Paul Elder, linux-kernel@vger.kernel.org,
linux-doc@vger.kernel.org
On Thu, Mar 07, 2024, Michael Grzeschik wrote:
> On Thu, Mar 07, 2024 at 01:57:44AM +0000, Thinh Nguyen wrote:
> > On Tue, Feb 27, 2024, Michael Grzeschik wrote:
> > > On Thu, Feb 22, 2024 at 01:20:04AM +0000, Thinh Nguyen wrote:
> > > > On Thu, Feb 22, 2024, Michael Grzeschik wrote:
> > > > > For #2: I found an issue in the handling of the completion of requests in
> > > > > the started list. When the interrupt handler is *explicitly* calling
> > > > > stop_active_transfer if the overall event of the request was an missed
> > > > > event. This event value only represents the value of the request that
> > > > > was actually triggering the interrupt.
> > > > >
> > > > > It also calls ep_cleanup_completed_requests and is iterating over the
> > > > > started requests and will call giveback/complete functions of the
> > > > > requests with the proper request status.
> > > > >
> > > > > So this will also catch missed requests in the queue. However, since
> > > > > there might be, lets say 5 good requests and one missed request, what
> > > > > will happen is, that each complete call for the first good requests will
> > > > > enqueue new requests into the started list and will also call the
> > > > > updatecmd on that transfer that was already missed until the loop will
> > > > > reach the one request with the MISSED status bit set.
> > > > >
> > > > > So in my opinion the patch from Jeff makes sense when adding the
> > > > > following change aswell. With those both changes the underruns and
> > > > > broken frames finally disappear. I am still unsure about the complete
> > > > > solution about that, since with this the mentioned 5 good requests
> > > > > will be cancelled aswell. So this is still a WIP status here.
> > > > >
> > > >
> > > > When the dwc3 driver issues stop_active_transfer(), that means that the
> > > > started_list is empty and there is an underrun.
> > >
> > > At this moment this is only the case when both, pending and started list
> > > are empty. Or the interrupt event was EXDEV.
> > >
> > > The main problem is that the function
> > > dwc3_gadget_ep_cleanup_completed_requests(dep, event, status); will
> > > issue an complete for each started request, which on the other hand will
> > > refill the pending list, and therefor after that refill the
> > > stop_active_transfer is currently never hit.
> > >
> > > > It treats the incoming requests as staled. However, for UVC, they are
> > > > still "good".
> > >
> > > Right, so in that case we can requeue them anyway. But this will have to
> > > be done after the stop transfer cmd has finished.
> > >
> > > > I think you can just check if the started_list is empty before queuing
> > > > new requests. If it is, perform stop_active_transfer() to reschedule the
> > > > incoming requests. None of the newly queue requests will be released
> > > > yet since they are in the pending_list.
> > >
> > > So that is basically exactly what my patch is doing. However in the case
> > > of an underrun it is not safe to call dwc3_gadget_ep_cleanup_completed_requests
> > > as jeff stated. So his underlying patch is really fixing an issue here.
> >
> > What I mean is to actively check for started list on every
> > usb_ep_queue() call. Checking during
> > dwc3_gadget_ep_cleanup_completed_requests() is already too late.
>
> I see.
>
> > >
> > > > For UVC, perhaps you can introduce a new flag to usb_request called
> > > > "ignore_queue_latency" or something equivalent. The dwc3 is already
> > > > partially doing this for UVC. With this new flag, we can rework dwc3 to
> > > > clearly separate the expected behavior from the function driver.
> > >
> > > I don't know why this "extra" flag is even necessary. The code example
> > > is already working without that extra flag.
> >
> > The flag is for controller to determine what kinds of behavior the
> > function driver expects. My intention is if this extra flag is not set,
> > the dwc3 driver will not attempt to reshcedule isoc request at all (ie.
> > no stop_active_transfer()).
>
> Ok.
>
> > >
> > > Actually I even came up with an better solution. Additionally of checking if
> > > one of the requests in the started list was missed, we can activly check if
> > > the trb ring did run dry and if dwc3_gadget_endpoint_trbs_complete is
> > > going to enqueue in to the empty trb ring.
> > >
> > > So my whole change looks like that:
> > >
> > > diff --git a/drivers/usb/dwc3/core.h b/drivers/usb/dwc3/core.h
> > > index efe6caf4d0e87..2c8047dcd1612 100644
> > > --- a/drivers/usb/dwc3/core.h
> > > +++ b/drivers/usb/dwc3/core.h
> > > @@ -952,6 +952,7 @@ struct dwc3_request {
> > > #define DWC3_REQUEST_STATUS_DEQUEUED 3
> > > #define DWC3_REQUEST_STATUS_STALLED 4
> > > #define DWC3_REQUEST_STATUS_COMPLETED 5
> > > +#define DWC3_REQUEST_STATUS_MISSED_ISOC 6
> > > #define DWC3_REQUEST_STATUS_UNKNOWN -1
> > > u8 epnum;
> > > diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
> > > index 858fe4c299b7a..a31f4d3502bd3 100644
> > > --- a/drivers/usb/dwc3/gadget.c
> > > +++ b/drivers/usb/dwc3/gadget.c
> > > @@ -2057,6 +2057,9 @@ static void dwc3_gadget_ep_cleanup_cancelled_requests(struct dwc3_ep *dep)
> > > req = next_request(&dep->cancelled_list);
> > > dwc3_gadget_ep_skip_trbs(dep, req);
> > > switch (req->status) {
> > > + case 0:
> > > + dwc3_gadget_giveback(dep, req, 0);
> > > + break;
> > > case DWC3_REQUEST_STATUS_DISCONNECTED:
> > > dwc3_gadget_giveback(dep, req, -ESHUTDOWN);
> > > break;
> > > @@ -2066,6 +2069,9 @@ static void dwc3_gadget_ep_cleanup_cancelled_requests(struct dwc3_ep *dep)
> > > case DWC3_REQUEST_STATUS_STALLED:
> > > dwc3_gadget_giveback(dep, req, -EPIPE);
> > > break;
> > > + case DWC3_REQUEST_STATUS_MISSED_ISOC:
> > > + dwc3_gadget_giveback(dep, req, -EXDEV);
> > > + break;
> > > default:
> > > dev_err(dwc->dev, "request cancelled with wrong reason:%d\n", req->status);
> > > dwc3_gadget_giveback(dep, req, -ECONNRESET);
> > > @@ -3509,6 +3515,36 @@ static int dwc3_gadget_ep_cleanup_completed_request(struct dwc3_ep *dep,
> > > return ret;
> > > }
> > > +static int dwc3_gadget_ep_check_missed_requests(struct dwc3_ep *dep)
> > > +{
> > > + struct dwc3_request *req;
> > > + struct dwc3_request *tmp;
> > > + int ret = 0;
> > > +
> > > + list_for_each_entry_safe(req, tmp, &dep->started_list, list) {
> > > + struct dwc3_trb *trb;
> > > +
> > > + trb = req->trb;
> > > + switch (DWC3_TRB_SIZE_TRBSTS(trb->size)) {
> > > + case DWC3_TRBSTS_MISSED_ISOC:
> > > + /* Isoc endpoint only */
> > > + ret = -EXDEV;
> > > + break;
> > > + case DWC3_TRB_STS_XFER_IN_PROG:
> > > + /* Applicable when End Transfer with ForceRM=0 */
> > > + case DWC3_TRBSTS_SETUP_PENDING:
> > > + /* Control endpoint only */
> > > + case DWC3_TRBSTS_OK:
> > > + default:
> > > + ret = 0;
> > > + break;
> > > + }
> > > + }
> > > +
> > > + return ret;
> > > +}
> > > +
> > > static void dwc3_gadget_ep_cleanup_completed_requests(struct dwc3_ep *dep,
> > > const struct dwc3_event_depevt *event, int status)
> > > {
> > > @@ -3565,22 +3601,51 @@ static bool dwc3_gadget_endpoint_trbs_complete(struct dwc3_ep *dep,
> > > {
> > > struct dwc3 *dwc = dep->dwc;
> > > bool no_started_trb = true;
> > > + unsigned int transfer_in_flight = 0;
> > > +
> > > + /* It is possible that the interrupt thread was delayed by
> > > + * scheduling in the system, and therefor the HW has already
> > > + * run dry. In that case the last trb in the queue is already
> > > + * handled by the hw. By checking the HWO bit we know to restart
> > > + * the whole transfer. The condition to appear is more likelely
> > > + * if not every trb has the IOC bit set and therefor does not
> > > + * trigger the interrupt thread fewer.
> > > + */
> > > + if (dep->number && usb_endpoint_xfer_isoc(dep->endpoint.desc)) {
> > > + struct dwc3_trb *trb;
> > > - dwc3_gadget_ep_cleanup_completed_requests(dep, event, status);
> > > + trb = dwc3_ep_prev_trb(dep, dep->trb_enqueue);
> > > + transfer_in_flight = trb->ctrl & DWC3_TRB_CTRL_HWO;
> > > + }
> > > - if (dep->flags & DWC3_EP_END_TRANSFER_PENDING)
> > > - goto out;
> > > + if (status == -EXDEV || !transfer_in_flight) {
> > > + struct dwc3_request *tmp;
> > > + struct dwc3_request *req;
> > > - if (!dep->endpoint.desc)
> > > - return no_started_trb;
> > > + if (!(dep->flags & DWC3_EP_END_TRANSFER_PENDING))
> > > + dwc3_stop_active_transfer(dep, true, true);
> > > - if (usb_endpoint_xfer_isoc(dep->endpoint.desc) &&
> > > - list_empty(&dep->started_list) &&
> > > - (list_empty(&dep->pending_list) || status == -EXDEV))
>
> @[!!here!!]
>
> > > - dwc3_stop_active_transfer(dep, true, true);
> > > - else if (dwc3_gadget_ep_should_continue(dep))
> > > - if (__dwc3_gadget_kick_transfer(dep) == 0)
> > > - no_started_trb = false;
> > > + list_for_each_entry_safe(req, tmp, &dep->started_list, list) {
> > > + dwc3_gadget_move_cancelled_request(req,
> > > + (DWC3_TRB_SIZE_TRBSTS(req->trb->size) == DWC3_TRBSTS_MISSED_ISOC) ?
> > > + DWC3_REQUEST_STATUS_MISSED_ISOC : 0);
> > > + }
> > > + } else {
> > > + dwc3_gadget_ep_cleanup_completed_requests(dep, event, status);
> > > +
> > > + if (dep->flags & DWC3_EP_END_TRANSFER_PENDING)
> > > + goto out;
> > > +
> > > + if (!dep->endpoint.desc)
> > > + return no_started_trb;
> > > +
> > > + if (usb_endpoint_xfer_isoc(dep->endpoint.desc) &&
> > > + list_empty(&dep->started_list) && list_empty(&dep->pending_list))
> > > + dwc3_stop_active_transfer(dep, true, true);
> > > + else if (dwc3_gadget_ep_should_continue(dep))
> > > + if (__dwc3_gadget_kick_transfer(dep) == 0)
> > > + no_started_trb = false;
> > > + }
> > > out:
> > > /*
> > >
> > > I will seperate the whole hunk into smaller changes and send an v1
> > > the next days to review.
> > >
>
> I finally send a v1 of my series.
>
> https://lore.kernel.org/linux-usb/20240307-dwc3-gadget-complete-irq-v1-0-4fe9ac0ba2b7@pengutronix.de/
>
> For the rest of the discussion, I would like to move the conversation to
> the newly send series.
I saw your pushes. Thanks. I'll review and move the discussion there.
>
> > No, we should not reschedule for every missed-isoc. We only want to
> > target underrun condition.
>
> As you stated above, with reschedule what you mean is calling
> stop_transfer after a missed transfer was seen?
>
> If so, why is this condition in there already? (@[!!here!!])
>
It's only to reschedule if started_list is empty _and_ if there's either
no pending request or there's a missed isoc. Not for every missed isoc.
BR,
Thinh
^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2024-03-08 2:47 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20221017205446.523796-1-w36195@motorola.com>
2022-10-17 20:54 ` [PATCH] usb: gadget: uvc: fix dropped frame after missed isoc Dan Vacura
2022-10-18 1:50 ` Bagas Sanjaya
2022-10-18 2:15 ` Dan Vacura
2022-10-18 5:13 ` Greg Kroah-Hartman
2022-10-17 20:54 ` [PATCH v3 2/6] usb: dwc3: gadget: cancel requests instead of release " Dan Vacura
2022-10-17 21:30 ` Thinh Nguyen
2022-10-18 2:10 ` Dan Vacura
2022-10-18 18:45 ` Thinh Nguyen
2022-10-18 19:13 ` Michael Grzeschik
2022-10-18 22:45 ` Thinh Nguyen
2022-10-19 6:46 ` Michael Grzeschik
2024-02-22 0:02 ` Michael Grzeschik
2024-02-22 1:20 ` Thinh Nguyen
2024-02-27 21:01 ` Michael Grzeschik
2024-03-07 1:57 ` Thinh Nguyen
2024-03-07 16:15 ` Michael Grzeschik
2024-03-08 2:47 ` Thinh Nguyen
2022-10-17 20:54 ` [PATCH v3 3/6] usb: gadget: uvc: fix sg handling in error case Dan Vacura
2022-10-17 20:54 ` [PATCH v3 4/6] usb: gadget: uvc: fix sg handling during video encode Dan Vacura
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).