From: Selvarasu Ganesan <selvarasu.g@samsung.com>
To: Thinh Nguyen <Thinh.Nguyen@synopsys.com>
Cc: "gregkh@linuxfoundation.org" <gregkh@linuxfoundation.org>,
"linux-usb@vger.kernel.org" <linux-usb@vger.kernel.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"jh0801.jung@samsung.com" <jh0801.jung@samsung.com>,
"dh10.jung@samsung.com" <dh10.jung@samsung.com>,
"naushad@samsung.com" <naushad@samsung.com>,
"akash.m5@samsung.com" <akash.m5@samsung.com>,
"rc93.raju@samsung.com" <rc93.raju@samsung.com>,
"taehyun.cho@samsung.com" <taehyun.cho@samsung.com>,
"hongpooh.kim@samsung.com" <hongpooh.kim@samsung.com>,
"eomji.oh@samsung.com" <eomji.oh@samsung.com>,
"shijie.cai@samsung.com" <shijie.cai@samsung.com>
Subject: Re: [PATCH] usb: dwc3: Potential fix of possible dwc3 interrupt storm
Date: Fri, 6 Sep 2024 05:58:18 +0530 [thread overview]
Message-ID: <64d049cc-d55d-4376-b6b9-402eb6f170c0@samsung.com> (raw)
In-Reply-To: <20240905231825.6r2sp2bapxidur7a@synopsys.com>
On 9/6/2024 4:48 AM, Thinh Nguyen wrote:
> On Fri, Sep 06, 2024, Selvarasu Ganesan wrote:
>> On 9/6/2024 2:43 AM, Thinh Nguyen wrote:
>>> On Thu, Sep 05, 2024, Selvarasu Ganesan wrote:
>>>> On 9/5/2024 5:56 AM, Thinh Nguyen wrote:
>>>>> On Wed, Sep 04, 2024, Selvarasu Ganesan wrote:
>>>>>> On 9/4/2024 6:33 AM, Thinh Nguyen wrote:
>>>>>>> On Mon, Sep 02, 2024, Selvarasu Ganesan wrote:
>>>>>>>> I would like to reconfirm from our end that in our failure scenario, we
>>>>>>>> observe that DWC3_EVENT_PENDING is set in evt->flags when the dwc3
>>>>>>>> resume sequence is executed, and the dwc->pending_events flag is not
>>>>>>>> being set.
>>>>>>>>
>>>>>>> If the controller is stopped, no event is generated until it's restarted
>>>>>>> again. (ie, you should not see GEVNTCOUNT updated after clearing
>>>>>>> DCTL.run_stop). If there's no event, no interrupt assertion should come
>>>>>>> from the controller.
>>>>>>>
>>>>>>> If the pending_events is not set and you still see this failure, then
>>>>>>> likely that the controller had started, and the interrupt is generated
>>>>>>> from the controller event. This occurs along with the interrupt
>>>>>>> generated from your connection notification from your setup.
>>>>>> I completely agree. My discussion revolves around the handling of the
>>>>>> DWC3_EVENT_PENDING flag in all situations. The purpose of using this
>>>>>> flag is to prevent the processing of new events if an existing event is
>>>>>> still being processed. This flag is set in the top-half interrupt
>>>>>> handler and cleared at the end of the bottom-half handler.
>>>>>>
>>>>>> Now, let's consider scenarios where the bottom half is not scheduled,
>>>>>> and a USB reconnect occurs. In this case, there is a possibility that
>>>>>> the interrupt line is unmasked in dwc3_event_buffers_setup, and the USB
>>>>>> controller begins posting new events. The top-half interrupt handler
>>>>>> checks for the DWC3_EVENT_PENDING flag and returns IRQ_HANDLED without
>>>>>> processing any new events. However, the USB controller continues to post
>>>>>> interrupts until they are acknowledged.
>>>>>>
>>>>>> Please review the complete sequence once with DWC3_EVENT_PENDING flag.
>>>>>>
>>>>>> My proposal is to clear or reset the DWC3_EVENT_PENDING flag when
>>>>>> unmasking the interrupt line dwc3_event_buffers_setup, apart from
>>>>>> bottom-half handler. Clearing the DWC3_EVENT_PENDING flag in
>>>>>> dwc3_event_buffers_setup does not cause any harm, as we have implemented
>>>>>> a temporary workaround in our test setup to prevent IRQ storms.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Working scenarios:
>>>>>> ==================
>>>>>> 1. Top-half handler:
>>>>>> a. if (evt->flags & DWC3_EVENT_PENDING)
>>>>>> return IRQ_HANDLED;
>>>>>> b. Set DWC3_EVENT_PENDING flag
>>>>>> c. Masking interrupt line
>>>>>>
>>>>>> 2. Bottom-half handler:
>>>>>> a. Un-masking interrupt line
>>>>>> b. Clear DWC3_EVENT_PENDING flag
>>>>>>
>>>>>> Failure scenarios:
>>>>>> ==================
>>>>>> 1. Top-half handler:
>>>>>> a. if (evt->flags & DWC3_EVENT_PENDING)
>>>>>> return IRQ_HANDLED;
>>>>>> b. Set DWC3_EVENT_PENDING flag
>>>>>> c. Masking interrupt line
>>>>> For DWC3_EVENT_PENDING flag to be set at this point (before we start the
>>>>> controller), that means that the GEVNTCOUNT was not 0 after
>>>>> soft-disconnect and that the pm_runtime_suspended() must be false.
>>>> In the top-half code where we set the DWC3_EVENT_PENDING flag, we
>>>> acknowledge GEVNTCOUNT. Therefore, I think it is not necessary for
>>>> GEVNTCOUNT to have a non-zero value until a new event occurs. In fact,
>>>> when we tried to print GEVNTCOUNT in a non-interrupt context, we found
>>>> that it was zero, where we received DWC3_EVENT_PENDING being set in
>>>> non-interrupt context.
>>> For DWC3_EVENT_PENDING to be set, GEVNTCOUNT must be non-zero. If you
>>> see it's zero, that means that it was already decremented by the driver.
>>>
>>> If the driver acknowledges the GEVNTCOUNT, then that means that the
>>> events are copied and prepared to be processed. The bottom-half thread
>>> is scheduled. If it's for stale event, I don't want it to be processed.
>>>
>>>>>> 2. No Bottom-half scheduled:
>>>>> Why is the bottom-half not scheduled? Or do you mean it hasn't woken up
>>>>> yet before the next top-half coming?
>>>> In very rare cases, it is possible in our platform that the CPU may not
>>>> be able to schedule the bottom half of the dwc3 interrupt because a work
>>>> queue lockup has occurred on the same CPU that is attempting to schedule
>>>> the dwc3 thread interrupt. In this case Yes, the bottom-half handler
>>>> hasn't woken up, then initiate an IRQ storm for new events after the
>>>> controller restarts, resulting in no more bottom-half scheduling due to
>>>> the CPU being stuck in processing continuous interrupts and return
>>>> IRQ_HANDLED by checking if (evt->flags & DWC3_EVENT_PENDING).
>>>>
>>>>>> 3. USB reconnect: dwc3_event_buffers_setup
>>>>>> a. Un-masking interrupt line
>>>>> Do we know that the GEVNTCOUNT is non-zero before starting the
>>>>> controller again?
>>>> The GEVNTCOUNT value showing as zero that we confirmed by adding debug
>>>> message here.
>>>>>> 4. Continuous interrupts : Top-half handler:
>>>>>> a. if (evt->flags & DWC3_EVENT_PENDING)
>>>>>> return IRQ_HANDLED;
>>>>>>
>>>>>> a. if (evt->flags & DWC3_EVENT_PENDING)
>>>>>> return IRQ_HANDLED;
>>>>>>
>>>>>> a. if (evt->flags & DWC3_EVENT_PENDING)
>>>>>> return IRQ_HANDLED;
>>>>>> .....
>>>>>>
>>>>>> .....
>>>>>>
>>>>>> .....
>>>>>>
>>>> Sure, I can try implementing the proposed code modifications in our
>>>> testing environment.
>>>>
>>>> But, I am uncertain about how these changes will effectively prevent an
>>>> IRQ storm when the USB controller sequence restarts with the
>>>> DWC3_EVENT_PENDING. The following code will only execute until the
>>>> DWC3_EVENT_PENDING is cleared, at which point the previous bottom-half
>>>> will not be scheduled.
>>>>
>>>> Please correct me if i am wrong in my above understanding.
>>> As I mentioned, I don't want DWC3_EVENT_PENDING flag to be set due to
>>> the stale event. I want to ignore and skip processing any stale event.
>>>
>>> The DWC3_EVENT_PENDING should not be set by the time
>>> dwc3_event_buffers_setup() is called.
>>>
>>> Specifically review this condition in my testing patch:
>>>
>>> /*
>>> * If the controller is halted, the event count is stale/invalid. Ignore
>>> * them. This happens if the interrupt assertion is from an out-of-band
>>> * resume notification.
>>> */
>>> if (!dwc->pullups_connected && count) {
>>> dwc3_writel(dwc->regs, DWC3_GEVNTCOUNT(0), count);
>>> return IRQ_HANDLED;
>>> }
>>>
>>> Let me know if the condition matches with what's happening for your
>>> case.
>> Hi Thinh,
>>
>> Thanks for your continuous reviews and suggestions.
>>
>> The given condition also will not matches in our case.
>> As i mentioned in starting of this thread please refer once the below
>> link of older discussion for similar issue from Samsung..
>>
>> https://urldefense.com/v3/__https://lore.kernel.org/linux-usb/20230102050831.105499-1-jh0801.jung@samsung.com/__;!!A4F2R9G_pg!a3VpPHvMr9enk0YPjSoWJ12Kr5Hw2Ka43Q_wi80lw6ty2tJT4hKRKsCnQNdqbVS3JORK2VwqdoXDWz1q8ynpe7Ex6cU$
>>
>>
>> DWC3_EVENT_PENDING flags set when count is 0.
>> It means "There are no interrupts to handle.".
>>
>> (struct dwc3_event_buffer *) ev_buf = 0xFFFFFF883DBF1180 (
>> (void *) buf = 0xFFFFFFC00DBDD000 = end+0x337D000,
>> (void *) cache = 0xFFFFFF8839F54080,
>> (unsigned int) length = 0x1000,
>> (unsigned int) lpos = 0x0,
>> *(unsigned int) count = 0x0, (unsigned int) flags = 0x00000001,*
>> (dma_addr_t) dma = 0x00000008BD7D7000,
>> (struct dwc3 *) dwc = 0xFFFFFF8839CBC880,
>> (u64) android_kabi_reserved1 = 0x0),
>
> This is the info of the event buffer that was reset after the
> dwc3_event_buffers_setup(). I'm talking about the first time
> DWC3_EVENT_PENDING flag was set.
Yes, the buffer that was reset before as part of
dwc3_event_buffers_setup() is correct.
I agree on your new code changes in below will prevent setting
DWC3_EVENT_PENDING and avoid the bottom-half handler if the controller
is halted, and the event count is invalid.
Are you suspecting that the DWC3_EVENT_PENDING flag was only set in this
scenario in our failure case?
/*diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index 89fc690fdf34..a525f7ea5886 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -4490,6 +4490,17 @@ static irqreturn_t dwc3_check_event_buf(struct dwc3_event_buffer *evt)
return IRQ_HANDLED;
count = dwc3_readl(dwc->regs, DWC3_GEVNTCOUNT(0));
+
+ /*
+ * If the controller is halted, the event count is stale/invalid. Ignore
+ * them. This happens if the interrupt assertion is from an out-of-band
+ * resume notification.
+ */
+ if (!dwc->pullups_connected && count) {
+ dwc3_writel(dwc->regs, DWC3_GEVNTCOUNT(0), count);
+ return IRQ_HANDLED;
+ }
+
count &= DWC3_GEVNTCOUNT_MASK;
if (!count)
return IRQ_NONE;
>
> By the time the interrupt storm below occur, the count in the buffer is
> already zero'ed out.
>
>> IRQ Storm:
>> (time = 47557628930999, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
>> (time = 47557628931268, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
>> (time = 47557628932383, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
>> (time = 47557628932652, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
>> (time = 47557628933768, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
>> (time = 47557628934037, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
>> ...
>> ...
>> ...
>>
>>
>> We are also fine with below code changes as you suggested earlier.
>> https://urldefense.com/v3/__https://lore.kernel.org/linux-usb/20230109190914.3blihjfjdcszazdd@synopsys.com/__;!!A4F2R9G_pg!a3VpPHvMr9enk0YPjSoWJ12Kr5Hw2Ka43Q_wi80lw6ty2tJT4hKRKsCnQNdqbVS3JORK2VwqdoXDWz1q8ynp367zvEw$
>>
>> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
>> index 65500246323b..3c36dfdb88f0 100644
>> --- a/drivers/usb/dwc3/gadget.c
>> +++ b/drivers/usb/dwc3/gadget.c
>> @@ -5515,8 +5515,15 @@ static irqreturn_t dwc3_check_event_buf(struct
>> dwc3_event_buffer *evt)
>> * irq event handler completes before caching new event to prevent
>> * losing events.
>> */
>> - if (evt->flags & DWC3_EVENT_PENDING)
>> + if (evt->flags & DWC3_EVENT_PENDING) {
>> + if (!evt->count) {
>> + u32 reg = dwc3_readl(dwc->regs, DWC3_GEVNTSIZ(0));
>> +
>> + if (!(reg & DWC3_GEVNTSIZ_INTMASK))
>> + evt->flags &= ~DWC3_EVENT_PENDING;
>> + }
>> return IRQ_HANDLED;
>> + }
>>
>>
> I don't want the bottom-half to be scheduled in the beginning as it may
> come before the cleanup in dwc3_event_buffers_setup().
You mean the above changes for clearing DWC3_EVENT_PENDINGnot required
as you given new change will prevent setting of DWC3_EVENT_PENDING
before dwc3_event_buffers_setup().
But I dont see any harm in above code changes for clearing
DWC3_EVENT_PENDING if it already set with evt->count=0.
Anyway I will try the your new proposed changes alone on our testing
setup and will update the status,
>
> BR,
> Thinh
next prev parent reply other threads:[~2024-09-06 0:28 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <CGME20240719110149epcas5p3dd468685a095c094ed2e540279bf3ec2@epcas5p3.samsung.com>
2024-07-19 11:00 ` [PATCH] usb: dwc3: Potential fix of possible dwc3 interrupt storm Selvarasu Ganesan
2024-08-07 0:38 ` Thinh Nguyen
2024-08-07 6:20 ` Selvarasu Ganesan
2024-08-08 1:15 ` Thinh Nguyen
2024-08-08 6:23 ` Selvarasu Ganesan
2024-08-09 23:42 ` Thinh Nguyen
2024-08-09 23:45 ` Thinh Nguyen
2024-08-10 15:14 ` Selvarasu Ganesan
2024-08-30 12:16 ` Selvarasu Ganesan
2024-08-31 0:50 ` Thinh Nguyen
2024-09-02 11:27 ` Selvarasu Ganesan
2024-09-03 23:41 ` Thinh Nguyen
2024-09-04 1:03 ` Thinh Nguyen
2024-09-04 15:50 ` Selvarasu Ganesan
2024-09-05 0:26 ` Thinh Nguyen
2024-09-05 13:19 ` Selvarasu Ganesan
2024-09-05 21:13 ` Thinh Nguyen
2024-09-05 23:05 ` Selvarasu Ganesan
2024-09-05 23:18 ` Thinh Nguyen
2024-09-06 0:28 ` Selvarasu Ganesan [this message]
2024-09-06 0:59 ` Thinh Nguyen
2024-09-06 19:02 ` Selvarasu Ganesan
2024-09-07 0:39 ` Thinh Nguyen
2024-09-10 13:37 ` Selvarasu Ganesan
2024-09-11 0:24 ` Thinh Nguyen
2024-09-13 12:42 ` Selvarasu Ganesan
2024-09-13 17:51 ` Thinh Nguyen
2024-09-13 18:00 ` Thinh Nguyen
2024-09-16 12:43 ` Selvarasu Ganesan
2024-09-16 21:19 ` Thinh Nguyen
2024-09-16 12:41 ` Selvarasu Ganesan
2024-09-16 21:18 ` Thinh Nguyen
2024-09-16 22:54 ` Selvarasu Ganesan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=64d049cc-d55d-4376-b6b9-402eb6f170c0@samsung.com \
--to=selvarasu.g@samsung.com \
--cc=Thinh.Nguyen@synopsys.com \
--cc=akash.m5@samsung.com \
--cc=dh10.jung@samsung.com \
--cc=eomji.oh@samsung.com \
--cc=gregkh@linuxfoundation.org \
--cc=hongpooh.kim@samsung.com \
--cc=jh0801.jung@samsung.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-usb@vger.kernel.org \
--cc=naushad@samsung.com \
--cc=rc93.raju@samsung.com \
--cc=shijie.cai@samsung.com \
--cc=taehyun.cho@samsung.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox