From: Hemant Kumar <hemantk@codeaurora.org>
To: Paul Davey <Paul.Davey@alliedtelesis.co.nz>,
"quic_jhugo@quicinc.com" <quic_jhugo@quicinc.com>,
"bbhatt@codeaurora.org" <bbhatt@codeaurora.org>,
"loic.poulain@linaro.org" <loic.poulain@linaro.org>,
"manivannan.sadhasivam@linaro.org"
<manivannan.sadhasivam@linaro.org>
Cc: "linux-arm-msm@vger.kernel.org" <linux-arm-msm@vger.kernel.org>
Subject: Re: bus: mhi: parse_xfer_event running transfer completion callbacks more than once for a given buffer
Date: Thu, 2 Sep 2021 14:49:37 -0700 [thread overview]
Message-ID: <832109c4-9b38-4bc9-1b5c-aa43deae74d4@codeaurora.org> (raw)
In-Reply-To: <bbac581950a84aef245abff92660fd7c2b977d16.camel@alliedtelesis.co.nz>
Hi Paul,
On 8/31/2021 10:17 PM, Paul Davey wrote:
> On Fri, 2021-08-27 at 16:51 +1200, Paul Davey wrote:
>> On Thu, 2021-08-26 at 09:54 -0600, Jeffrey Hugo wrote:
>>> On 8/23/2021 12:47 AM, Paul Davey wrote:
>>>> Hi Hemant, Jeffery
>>>>
>>>> I have some more information after some testing.
>>>>
>>>>>> Do you have a log which prints the TRE being processed?
>>>>>> Basically i
>>>>>> am
>>>>>> trying understand this : by the time you get double free
>>>>>> issue,
>>>>>> is
>>>>>> there
>>>>>> any pattern with respect to the TRE that is being processed.
>>>>>> For
>>>>>> example
>>>>>> when host processed the given TRE for the first time with
>>>>>> RP1,
>>>>>> stale
>>>>>> TRE
>>>>>> was posted by Event RP2 right after RP1
>>>>>>
>>>>>> ->RP1 [TRE1]
>>>>>> ->RP2 [TRE1]
>>>>>>
>>>>>> or occurrence of stale TRE event is random?
>>>>
>>>> [...]
>>>
>>
>> Secondly we saw the following pattern in completion events:
>>
>> mhi mhi0: (IP_HW0_MBIM-Up) Completion Event code: 2 length: 5e2 ptr:
>> 7c4004e0
>> mhi mhi0: (IP_HW0_MBIM-Up) Completion Event code: 2 length: 5e2 ptr:
>> 7c400520
>> mhi mhi0: (IP_HW0_MBIM-Up) Completion Event code: 2 length: 5e2 ptr:
>> 7c4004c0
>> mhi mhi0: (IP_HW0_MBIM-Up) Completion Event code: 2 length: 5e2 ptr:
>> 7c4004b0
>> mhi mhi0: (IP_HW0_MBIM-Up) Completion Event code: 2 length: 5e2 ptr:
>> 7c4004a0
>>
>> Here we can see that instead of a completion event for 7c4004d0 we
>> have
>> one for 7c400520 which is significantly ahead of the other point and
>> from the list of TREs I store in mhi_gen_tre I suspect that 7c400520
>> is
>> the next TRE to be used in the TRE ring at this time, as the other
>> information shows it would be the oldest entry in that list. I am
>> not
>> sure what could have caused this but this is a different case to the
>> modem repeating the same TRE in a completion event.
>>
>
>
> I have considered the code further, and while I have seen cases of
> identical TRE completion events occurring, I do not think these result
> in the double free case because if the event is actually the same as
> the last one then the new dev_rp which parse_xfer_event will attempt to
> advance the local_rp to will already be equal to the local_rp and the
> whole loop will be skipped in the first place. The troublesome
> behaviour comes from the case I describe above where a jump to a
> "future" TRE's completion event seems to occur followed by a
> continuation of the order, this results in the tre_ring rp being
> advanced to that future TRE and then the next completion event
> following the previous ordered pattern would be before that rp location
> and the loop will run through the entire tre_ring to reach the new rp
> location.
>
>
> I do have another question though, the driver code seems to in some
> cases take the mhi_chan->lock when updating the doorbell register, but
> not when queueing new transfers. What is the actual purpose of this
> lock and why does it seem so inconsistently used? Is there any chance
> that some of my problems may be the result of queueing new transfers
> racing somehow with completion event processing?
Are you pointing to MHI_EV_CC_OOB and MHI_EV_CC_DB_MODE completion code
? To prove that race condition you disable burst mode as an experiment
In pci_generic.c
#define MHI_CHANNEL_CONFIG_HW_UL(ch_num, ch_name, el_count, ev_ring)
- .doorbell = MHI_DB_BRST_ENABLE
+ .doorbell = MHI_DB_BRST_DISABLE
I agree that locking is an issue in that case as we are taking
read/write channel lock in parse_xfer_event and using only pm_lock in
queue API.
Thanks,
Hemant
>
> Thanks,
> Paul
>
--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora
Forum, a Linux Foundation Collaborative Project
prev parent reply other threads:[~2021-09-02 21:50 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-08-06 4:20 bus: mhi: parse_xfer_event running transfer completion callbacks more than once for a given buffer Paul Davey
2021-08-06 9:43 ` Loic Poulain
2021-08-13 22:55 ` Hemant Kumar
2021-08-13 23:10 ` Hemant Kumar
2021-08-16 13:48 ` Jeffrey Hugo
2021-08-15 23:30 ` Paul Davey
2021-08-23 6:47 ` Paul Davey
2021-08-26 15:54 ` Jeffrey Hugo
2021-08-27 4:51 ` Paul Davey
2021-09-01 5:17 ` Paul Davey
2021-09-02 21:49 ` Hemant Kumar [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=832109c4-9b38-4bc9-1b5c-aa43deae74d4@codeaurora.org \
--to=hemantk@codeaurora.org \
--cc=Paul.Davey@alliedtelesis.co.nz \
--cc=bbhatt@codeaurora.org \
--cc=linux-arm-msm@vger.kernel.org \
--cc=loic.poulain@linaro.org \
--cc=manivannan.sadhasivam@linaro.org \
--cc=quic_jhugo@quicinc.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox