From: Thinh Nguyen <Thinh.Nguyen@synopsys.com>
To: Selvarasu Ganesan <selvarasu.g@samsung.com>
Cc: Thinh Nguyen <Thinh.Nguyen@synopsys.com>,
"gregkh@linuxfoundation.org" <gregkh@linuxfoundation.org>,
"linux-usb@vger.kernel.org" <linux-usb@vger.kernel.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"jh0801.jung@samsung.com" <jh0801.jung@samsung.com>,
"dh10.jung@samsung.com" <dh10.jung@samsung.com>,
"naushad@samsung.com" <naushad@samsung.com>,
"akash.m5@samsung.com" <akash.m5@samsung.com>,
"rc93.raju@samsung.com" <rc93.raju@samsung.com>,
"taehyun.cho@samsung.com" <taehyun.cho@samsung.com>,
"hongpooh.kim@samsung.com" <hongpooh.kim@samsung.com>,
"eomji.oh@samsung.com" <eomji.oh@samsung.com>,
"shijie.cai@samsung.com" <shijie.cai@samsung.com>
Subject: Re: [PATCH] usb: dwc3: Potential fix of possible dwc3 interrupt storm
Date: Thu, 5 Sep 2024 23:18:27 +0000 [thread overview]
Message-ID: <20240905231825.6r2sp2bapxidur7a@synopsys.com> (raw)
In-Reply-To: <f9561f03-5f83-4270-b7f3-17b880cfabfe@samsung.com>
On Fri, Sep 06, 2024, Selvarasu Ganesan wrote:
>
> On 9/6/2024 2:43 AM, Thinh Nguyen wrote:
> > On Thu, Sep 05, 2024, Selvarasu Ganesan wrote:
> >> On 9/5/2024 5:56 AM, Thinh Nguyen wrote:
> >>> On Wed, Sep 04, 2024, Selvarasu Ganesan wrote:
> >>>> On 9/4/2024 6:33 AM, Thinh Nguyen wrote:
> >>>>> On Mon, Sep 02, 2024, Selvarasu Ganesan wrote:
> >>>>>> I would like to reconfirm from our end that in our failure scenario, we
> >>>>>> observe that DWC3_EVENT_PENDING is set in evt->flags when the dwc3
> >>>>>> resume sequence is executed, and the dwc->pending_events flag is not
> >>>>>> being set.
> >>>>>>
> >>>>> If the controller is stopped, no event is generated until it's restarted
> >>>>> again. (ie, you should not see GEVNTCOUNT updated after clearing
> >>>>> DCTL.run_stop). If there's no event, no interrupt assertion should come
> >>>>> from the controller.
> >>>>>
> >>>>> If the pending_events is not set and you still see this failure, then
> >>>>> likely that the controller had started, and the interrupt is generated
> >>>>> from the controller event. This occurs along with the interrupt
> >>>>> generated from your connection notification from your setup.
> >>>> I completely agree. My discussion revolves around the handling of the
> >>>> DWC3_EVENT_PENDING flag in all situations. The purpose of using this
> >>>> flag is to prevent the processing of new events if an existing event is
> >>>> still being processed. This flag is set in the top-half interrupt
> >>>> handler and cleared at the end of the bottom-half handler.
> >>>>
> >>>> Now, let's consider scenarios where the bottom half is not scheduled,
> >>>> and a USB reconnect occurs. In this case, there is a possibility that
> >>>> the interrupt line is unmasked in dwc3_event_buffers_setup, and the USB
> >>>> controller begins posting new events. The top-half interrupt handler
> >>>> checks for the DWC3_EVENT_PENDING flag and returns IRQ_HANDLED without
> >>>> processing any new events. However, the USB controller continues to post
> >>>> interrupts until they are acknowledged.
> >>>>
> >>>> Please review the complete sequence once with DWC3_EVENT_PENDING flag.
> >>>>
> >>>> My proposal is to clear or reset the DWC3_EVENT_PENDING flag when
> >>>> unmasking the interrupt line dwc3_event_buffers_setup, apart from
> >>>> bottom-half handler. Clearing the DWC3_EVENT_PENDING flag in
> >>>> dwc3_event_buffers_setup does not cause any harm, as we have implemented
> >>>> a temporary workaround in our test setup to prevent IRQ storms.
> >>>>
> >>>>
> >>>>
> >>>> Working scenarios:
> >>>> ==================
> >>>> 1. Top-half handler:
> >>>> a. if (evt->flags & DWC3_EVENT_PENDING)
> >>>> return IRQ_HANDLED;
> >>>> b. Set DWC3_EVENT_PENDING flag
> >>>> c. Masking interrupt line
> >>>>
> >>>> 2. Bottom-half handler:
> >>>> a. Un-masking interrupt line
> >>>> b. Clear DWC3_EVENT_PENDING flag
> >>>>
> >>>> Failure scenarios:
> >>>> ==================
> >>>> 1. Top-half handler:
> >>>> a. if (evt->flags & DWC3_EVENT_PENDING)
> >>>> return IRQ_HANDLED;
> >>>> b. Set DWC3_EVENT_PENDING flag
> >>>> c. Masking interrupt line
> >>> For DWC3_EVENT_PENDING flag to be set at this point (before we start the
> >>> controller), that means that the GEVNTCOUNT was not 0 after
> >>> soft-disconnect and that the pm_runtime_suspended() must be false.
> >> In the top-half code where we set the DWC3_EVENT_PENDING flag, we
> >> acknowledge GEVNTCOUNT. Therefore, I think it is not necessary for
> >> GEVNTCOUNT to have a non-zero value until a new event occurs. In fact,
> >> when we tried to print GEVNTCOUNT in a non-interrupt context, we found
> >> that it was zero, where we received DWC3_EVENT_PENDING being set in
> >> non-interrupt context.
> > For DWC3_EVENT_PENDING to be set, GEVNTCOUNT must be non-zero. If you
> > see it's zero, that means that it was already decremented by the driver.
> >
> > If the driver acknowledges the GEVNTCOUNT, then that means that the
> > events are copied and prepared to be processed. The bottom-half thread
> > is scheduled. If it's for stale event, I don't want it to be processed.
> >
> >>>> 2. No Bottom-half scheduled:
> >>> Why is the bottom-half not scheduled? Or do you mean it hasn't woken up
> >>> yet before the next top-half coming?
> >> In very rare cases, it is possible in our platform that the CPU may not
> >> be able to schedule the bottom half of the dwc3 interrupt because a work
> >> queue lockup has occurred on the same CPU that is attempting to schedule
> >> the dwc3 thread interrupt. In this case Yes, the bottom-half handler
> >> hasn't woken up, then initiate an IRQ storm for new events after the
> >> controller restarts, resulting in no more bottom-half scheduling due to
> >> the CPU being stuck in processing continuous interrupts and return
> >> IRQ_HANDLED by checking if (evt->flags & DWC3_EVENT_PENDING).
> >>
> >>>> 3. USB reconnect: dwc3_event_buffers_setup
> >>>> a. Un-masking interrupt line
> >>> Do we know that the GEVNTCOUNT is non-zero before starting the
> >>> controller again?
> >> The GEVNTCOUNT value showing as zero that we confirmed by adding debug
> >> message here.
> >>>> 4. Continuous interrupts : Top-half handler:
> >>>> a. if (evt->flags & DWC3_EVENT_PENDING)
> >>>> return IRQ_HANDLED;
> >>>>
> >>>> a. if (evt->flags & DWC3_EVENT_PENDING)
> >>>> return IRQ_HANDLED;
> >>>>
> >>>> a. if (evt->flags & DWC3_EVENT_PENDING)
> >>>> return IRQ_HANDLED;
> >>>> .....
> >>>>
> >>>> .....
> >>>>
> >>>> .....
> >>>>
> >> Sure, I can try implementing the proposed code modifications in our
> >> testing environment.
> >>
> >> But, I am uncertain about how these changes will effectively prevent an
> >> IRQ storm when the USB controller sequence restarts with the
> >> DWC3_EVENT_PENDING. The following code will only execute until the
> >> DWC3_EVENT_PENDING is cleared, at which point the previous bottom-half
> >> will not be scheduled.
> >>
> >> Please correct me if i am wrong in my above understanding.
> > As I mentioned, I don't want DWC3_EVENT_PENDING flag to be set due to
> > the stale event. I want to ignore and skip processing any stale event.
> >
> > The DWC3_EVENT_PENDING should not be set by the time
> > dwc3_event_buffers_setup() is called.
> >
> > Specifically review this condition in my testing patch:
> >
> > /*
> > * If the controller is halted, the event count is stale/invalid. Ignore
> > * them. This happens if the interrupt assertion is from an out-of-band
> > * resume notification.
> > */
> > if (!dwc->pullups_connected && count) {
> > dwc3_writel(dwc->regs, DWC3_GEVNTCOUNT(0), count);
> > return IRQ_HANDLED;
> > }
> >
> > Let me know if the condition matches with what's happening for your
> > case.
> Hi Thinh,
>
> Thanks for your continuous reviews and suggestions.
>
> The given condition also will not matches in our case.
> As i mentioned in starting of this thread please refer once the below
> link of older discussion for similar issue from Samsung..
>
> https://urldefense.com/v3/__https://lore.kernel.org/linux-usb/20230102050831.105499-1-jh0801.jung@samsung.com/__;!!A4F2R9G_pg!a3VpPHvMr9enk0YPjSoWJ12Kr5Hw2Ka43Q_wi80lw6ty2tJT4hKRKsCnQNdqbVS3JORK2VwqdoXDWz1q8ynpe7Ex6cU$
>
>
> DWC3_EVENT_PENDING flags set when count is 0.
> It means "There are no interrupts to handle.".
>
> (struct dwc3_event_buffer *) ev_buf = 0xFFFFFF883DBF1180 (
> (void *) buf = 0xFFFFFFC00DBDD000 = end+0x337D000,
> (void *) cache = 0xFFFFFF8839F54080,
> (unsigned int) length = 0x1000,
> (unsigned int) lpos = 0x0,
> *(unsigned int) count = 0x0, (unsigned int) flags = 0x00000001,*
> (dma_addr_t) dma = 0x00000008BD7D7000,
> (struct dwc3 *) dwc = 0xFFFFFF8839CBC880,
> (u64) android_kabi_reserved1 = 0x0),
This is the info of the event buffer that was reset after the
dwc3_event_buffers_setup(). I'm talking about the first time
DWC3_EVENT_PENDING flag was set.
By the time the interrupt storm below occur, the count in the buffer is
already zero'ed out.
>
> IRQ Storm:
> (time = 47557628930999, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
> (time = 47557628931268, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
> (time = 47557628932383, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
> (time = 47557628932652, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
> (time = 47557628933768, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
> (time = 47557628934037, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
> ...
> ...
> ...
>
>
> We are also fine with below code changes as you suggested earlier.
> https://urldefense.com/v3/__https://lore.kernel.org/linux-usb/20230109190914.3blihjfjdcszazdd@synopsys.com/__;!!A4F2R9G_pg!a3VpPHvMr9enk0YPjSoWJ12Kr5Hw2Ka43Q_wi80lw6ty2tJT4hKRKsCnQNdqbVS3JORK2VwqdoXDWz1q8ynp367zvEw$
>
> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
> index 65500246323b..3c36dfdb88f0 100644
> --- a/drivers/usb/dwc3/gadget.c
> +++ b/drivers/usb/dwc3/gadget.c
> @@ -5515,8 +5515,15 @@ static irqreturn_t dwc3_check_event_buf(struct
> dwc3_event_buffer *evt)
> * irq event handler completes before caching new event to prevent
> * losing events.
> */
> - if (evt->flags & DWC3_EVENT_PENDING)
> + if (evt->flags & DWC3_EVENT_PENDING) {
> + if (!evt->count) {
> + u32 reg = dwc3_readl(dwc->regs, DWC3_GEVNTSIZ(0));
> +
> + if (!(reg & DWC3_GEVNTSIZ_INTMASK))
> + evt->flags &= ~DWC3_EVENT_PENDING;
> + }
> return IRQ_HANDLED;
> + }
>
>
I don't want the bottom-half to be scheduled in the beginning as it may
come before the cleanup in dwc3_event_buffers_setup().
BR,
Thinh
next prev parent reply other threads:[~2024-09-05 23:23 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <CGME20240719110149epcas5p3dd468685a095c094ed2e540279bf3ec2@epcas5p3.samsung.com>
2024-07-19 11:00 ` [PATCH] usb: dwc3: Potential fix of possible dwc3 interrupt storm Selvarasu Ganesan
2024-08-07 0:38 ` Thinh Nguyen
2024-08-07 6:20 ` Selvarasu Ganesan
2024-08-08 1:15 ` Thinh Nguyen
2024-08-08 6:23 ` Selvarasu Ganesan
2024-08-09 23:42 ` Thinh Nguyen
2024-08-09 23:45 ` Thinh Nguyen
2024-08-10 15:14 ` Selvarasu Ganesan
2024-08-30 12:16 ` Selvarasu Ganesan
2024-08-31 0:50 ` Thinh Nguyen
2024-09-02 11:27 ` Selvarasu Ganesan
2024-09-03 23:41 ` Thinh Nguyen
2024-09-04 1:03 ` Thinh Nguyen
2024-09-04 15:50 ` Selvarasu Ganesan
2024-09-05 0:26 ` Thinh Nguyen
2024-09-05 13:19 ` Selvarasu Ganesan
2024-09-05 21:13 ` Thinh Nguyen
2024-09-05 23:05 ` Selvarasu Ganesan
2024-09-05 23:18 ` Thinh Nguyen [this message]
2024-09-06 0:28 ` Selvarasu Ganesan
2024-09-06 0:59 ` Thinh Nguyen
2024-09-06 19:02 ` Selvarasu Ganesan
2024-09-07 0:39 ` Thinh Nguyen
2024-09-10 13:37 ` Selvarasu Ganesan
2024-09-11 0:24 ` Thinh Nguyen
2024-09-13 12:42 ` Selvarasu Ganesan
2024-09-13 17:51 ` Thinh Nguyen
2024-09-13 18:00 ` Thinh Nguyen
2024-09-16 12:43 ` Selvarasu Ganesan
2024-09-16 21:19 ` Thinh Nguyen
2024-09-16 12:41 ` Selvarasu Ganesan
2024-09-16 21:18 ` Thinh Nguyen
2024-09-16 22:54 ` Selvarasu Ganesan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240905231825.6r2sp2bapxidur7a@synopsys.com \
--to=thinh.nguyen@synopsys.com \
--cc=akash.m5@samsung.com \
--cc=dh10.jung@samsung.com \
--cc=eomji.oh@samsung.com \
--cc=gregkh@linuxfoundation.org \
--cc=hongpooh.kim@samsung.com \
--cc=jh0801.jung@samsung.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-usb@vger.kernel.org \
--cc=naushad@samsung.com \
--cc=rc93.raju@samsung.com \
--cc=selvarasu.g@samsung.com \
--cc=shijie.cai@samsung.com \
--cc=taehyun.cho@samsung.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox