From: Felipe Balbi <balbi@kernel.org>
To: Baolin Wang <baolin.wang@linaro.org>
Cc: Janusz Dziedzic <janusz.dziedzic@gmail.com>,
Greg KH <gregkh@linuxfoundation.org>,
USB <linux-usb@vger.kernel.org>,
LKML <linux-kernel@vger.kernel.org>,
Linaro Kernel Mailman List <linaro-kernel@lists.linaro.org>,
Mark Brown <broonie@kernel.org>
Subject: Re: [PATCH] usb: dwc3: gadget: Avoid race between dwc3 interrupt handler and irq thread handler
Date: Thu, 05 Jan 2017 11:26:40 +0200 [thread overview]
Message-ID: <878tqpx3fz.fsf@linux.intel.com> (raw)
In-Reply-To: <CAMz4ku+_zr=C68py-8cXq_pyQC3LigQykHqOMjSkwXZucWw=3w@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 5008 bytes --]
Hi,
Baolin Wang <baolin.wang@linaro.org> writes:
>>>>>>> On 27 December 2016 at 18:52, Janusz Dziedzic <janusz.dziedzic@gmail.com> wrote:
>>>>>>>> 2016-12-26 9:01 GMT+01:00 Baolin Wang <baolin.wang@linaro.org>:
>>>>>>>>> On some platfroms(like x86 platform), when one core is running the USB gadget
>>>>>>>>> irq thread handler by dwc3_thread_interrupt(), meanwhile another core also can
>>>>>>>>> respond other interrupts from dwc3 controller and modify the event buffer by
>>>>>>>>> dwc3_interrupt() function, that will cause getting the wrong event count in
>>>>>>>>> irq thread handler to make the USB function abnormal.
>>>>>>>>>
>>>>>>>>> We should add spin_lock/unlock() in dwc3_check_event_buf() to avoid this race.
>>>>>>>>>
>>>>>>>> Interesting, I always think we mask interrupt in dwc3_interrupt() by setting
>>>>>>>> DWC3_GEVNTSIZ_INTMASK
>>>>>>>> And unmask interrupt when we end dwc3_thread_interrupt().
>>>>>>>>
>>>>>>>> So, we shouldn't get any IRQ from HW during dwc3_thread_interrupt(),
>>>>>>>> or I miss something?
>>>>>>>> Do you have some traces that indicate this masking will not work correctly?
>>>>>>>
>>>>>>> Yes, but we just masked the interrupts described in DEVTEN register,
>>>>>>> and we did not mask all the interrupts, like the endpoint command
>>>>>>> complete event, transfer complete event and so on, so we can still get
>>>>>>> interrupts.
>>>>>>
>>>>>> not true, we masked interrupts for the entire event buffer:
>>>>>
>>>>> Yes, you are right and I missed that. I should reproduce this problem
>>>>> and analyse the real reason.
>>>>>
>>>>>>
>>>>>>> static irqreturn_t dwc3_check_event_buf(struct dwc3_event_buffer *evt)
>>>>>>> {
>>>>>>> struct dwc3 *dwc = evt->dwc;
>>>>>>> u32 count;
>>>>>>> u32 reg;
>>>>>>>
>>>>>>> if (pm_runtime_suspended(dwc->dev)) {
>>>>>>> pm_runtime_get(dwc->dev);
>>>>>>> disable_irq_nosync(dwc->irq_gadget);
>>>>>>> dwc->pending_events = true;
>>>>>>> return IRQ_HANDLED;
>>>>>>> }
>>>>>>>
>>>>>>> count = dwc3_readl(dwc->regs, DWC3_GEVNTCOUNT(0));
>>>>>>> count &= DWC3_GEVNTCOUNT_MASK;
>>>>>>> if (!count)
>>>>>>> return IRQ_NONE;
>>>>>>>
>>>>>>> evt->count = count;
>>>>>>> evt->flags |= DWC3_EVENT_PENDING;
>>>>>>>
>>>>>>> /* Mask interrupt */
>>>>>>> reg = dwc3_readl(dwc->regs, DWC3_GEVNTSIZ(0));
>>>>>>> reg |= DWC3_GEVNTSIZ_INTMASK;
>>>>>>
>>>>>> See here ?!?
>>>>>>
>>>>>>> dwc3_writel(dwc->regs, DWC3_GEVNTSIZ(0), reg);
>>>>>>>
>>>>>>> return IRQ_WAKE_THREAD;
>>>>>>> }
>>>>>>
>>>>>>>> BTW, what value you get when problem occured, 0xFFFC?
>>>>>>>
>>>>>>> Yes, something like this, the event count become huge.
>>>>>>
>>>> Probably you have little bit different code than current community
>>>> version (depends how your PM works).
>>>>
>>>> This is possible when we write:
>>>> dwc3_writel(dwc->regs, DWC3_GEVNTCOUNT(0), 0);
>>>> And after that
>>>> dwc3_writel(dwc->regs, DWC3_GEVNTCOUNT(0), 4);
>>>>
>>>> After that we will get 0xFFFC (-4).
>>>>
>>>> Possible races:
>>>> 1) dwc3_event_buffers_setup/dwc3_event_buffers_cleanup - write 0
>>>> 2) dwc3_thread - write 4
>>>>
>>>> While [1] could be called in PM work or UM context (init in Android
>>>> case) spin_lock_irqsave() will only disable local irqs and still we
>>>> could get IRQ on different core, next update evt->count and run
>>>> thread...
>>>
>>> Yeah, that's the possible races.
>>
>> and you have triggered this with mailine? How? We don't write to GEVNT*
>> registers from PM code and we only allow runtime_suspend with cable
>> dettached.
>
> Sorry for late reply since I am busy on other things. I just agreed
> with the possible races pointed by Janusz. I need to look at if these
> are happened on my platform and also I found some out of tree code
> which will clean the GEVNTCOUNT register when stop the gadget. I will
> check the mainline kernel and resend new patch if I make this problem
> clearly. Anyway thanks for your help and suggestion.
IOW, you sent me a patch to be integrated in the tree which everybody in
the whole world uses and you didn't even test anything in that very
tree? How am I supposed to trust you're sending me tested patches from
now on?
Clearly you have no empathy for those working countless hours to keep
this stable and working. If you're ready to send me a completely
untested patch and claim that it's fixing a race condition you have
never seen for yourself, it becomes difficult to trust any patches
you're sending me.
You should know better. Your employer has people on staff who are able
to clarify all these "details". You should never, ever send me patches
like this again. Mark, himself, has a long track record of contributing
to upstream development; maybe you should have a discussion with him
offline about this.
--
balbi
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]
next prev parent reply other threads:[~2017-01-05 9:28 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-12-26 8:01 [PATCH] usb: dwc3: gadget: Avoid race between dwc3 interrupt handler and irq thread handler Baolin Wang
2016-12-27 2:39 ` Lu Baolu
2016-12-27 2:58 ` Baolin Wang
2016-12-27 4:45 ` Lu Baolu
2016-12-27 11:05 ` Felipe Balbi
2016-12-28 15:27 ` Janusz Dziedzic
2016-12-28 16:19 ` Felipe Balbi
2016-12-29 1:29 ` John Youn
2017-01-05 19:08 ` John Youn
2017-01-06 2:44 ` Baolin Wang
2016-12-27 10:52 ` Janusz Dziedzic
2016-12-27 11:06 ` Baolin Wang
2016-12-27 11:11 ` Felipe Balbi
2016-12-27 12:16 ` Baolin Wang
2016-12-28 12:30 ` Janusz Dziedzic
2017-01-03 12:21 ` Baolin Wang
2017-01-03 12:33 ` Felipe Balbi
2017-01-05 2:07 ` Baolin Wang
2017-01-05 9:26 ` Felipe Balbi [this message]
2017-01-05 9:43 ` Baolin Wang
2017-01-05 11:19 ` Felipe Balbi
2017-01-05 12:03 ` Baolin Wang
2016-12-27 11:07 ` Felipe Balbi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=878tqpx3fz.fsf@linux.intel.com \
--to=balbi@kernel.org \
--cc=baolin.wang@linaro.org \
--cc=broonie@kernel.org \
--cc=gregkh@linuxfoundation.org \
--cc=janusz.dziedzic@gmail.com \
--cc=linaro-kernel@lists.linaro.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-usb@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).