From mboxrd@z Thu Jan 1 00:00:00 1970 Return-path: Received: from wolverine02.qualcomm.com ([199.106.114.251]) by bombadil.infradead.org with esmtps (Exim 4.80.1 #2 (Red Hat Linux)) id 1YBZeD-0004qS-NY for ath10k@lists.infradead.org; Thu, 15 Jan 2015 01:54:59 +0000 Message-ID: <54B71DB9.10608@qca.qualcomm.com> Date: Wed, 14 Jan 2015 17:54:01 -0800 From: Peter Oh MIME-Version: 1.0 Subject: Re: Anyone seeing tx-credits 'hang'? References: <54AEF595.9030205@candelatech.com> <54B00807.9060909@candelatech.com> <54B56D00.4010401@candelatech.com> <54B6AE1E.3020900@candelatech.com> <54B6D67C.4090006@qca.qualcomm.com> <54B6DE13.1080609@candelatech.com> In-Reply-To: <54B6DE13.1080609@candelatech.com> List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Sender: "ath10k" Errors-To: ath10k-bounces+kvalo=adurom.com@lists.infradead.org To: Ben Greear , Michal Kazior Cc: ath10k On 01/14/2015 01:22 PM, Ben Greear wrote: > On 01/14/2015 12:50 PM, Peter Oh wrote: >> On 01/14/2015 09:57 AM, Ben Greear wrote: >>> On 01/14/2015 01:45 AM, Michal Kazior wrote: >>>> On 13 January 2015 at 20:07, Ben Greear wrote: >>>> [...] >>>>> I managed to get some better debug out of the firmware. >>>>> >>>>> I am having a hell of a time figuring out how the code flows through all >>>>> of the callbacks (in both firmware and driver), but it appears this is what happened: >>>>> >>>>> (I have instrumented transfer-id in both firmware and driver) >>>>> >>>>> firmware sent wmi message with transfer-id of 72. >>>>> kernel received this transfer-id >>>>> firmware's last send-callback transfer ID is 71. >>>>> >>>>> So, it seems that either ath10k did not do the transfer-complete logic, >>>>> did it incorrectly, or the firmware did not notice it was done. >>>>> >>>>> I cannot find where the transfer complete code that should be updating >>>>> firmware is at. If you know, can you point me to it? >>>> I think the send-callback should be called when CE is simply done >>>> doing it's stuff. There's no need for the other side to ack anything >>>> explicitly (it just needs to have a free buffer on it's side so CE can >>>> copy it over). >>>> >>>> Or maybe it is the HOST_IS_COPY_COMPLETE_MASK? Not really sure. >>> I am now guessing that some magic IRQ happens when ath10k_ce_src_ring_write_index_set() >>> is called. >> You may already notice it, but to clarify the magic IRQ is DMA interrupts. Copy Engine is almost the same as DMA engine with channels which triggers an >> interrupt automatically when a DMA transfer is completed. we have registers to enable it, HOST_IE (offset 0x2c) and TARGET_IE(offset 0x24). >> ath10k_ce_src_ring_write_index_set (SRC_RING_WR_IND register, offset 0x3c) triggers fetching data automatically using DMA by ASIC design. > Yes, that makes sense, and I appreciate the extra details. > >>> I may have narrowed down the problem a bit further now. >>> >>> I printed out the ring indexes in firmware and driver when lockup >>> occured. The target -> host ring ids match fine, but I notice that >>> it appears the firmware has pending entries in it's host -> target wmi >>> ring that it has not consumed. >>> >>> Maybe it missed an irq or has some related race. >> Since the IRQ is a DMA interrupt triggered by ASIC, all the amount of data size must be transferred to trigger the interrupt. If IRQ does not happen even after >> all the data transferred, then we may call it an ASIC bug otherwise it could be software issues. The corresponding status register is TARGET_IS (offset 0x28) >> and HOST_IS (offset 0x30), but I'm not sure which registers represent the number of bytes has been transferred. If we have this type of register, it will be >> easy to determine if DMA is done. > I found some things that look risky in the firmware CE code, but my attempts at > fixing them made no improvement, so I am not sure I found any real problems in > this area yet. I'll be happy to send you the firmware patches for my debugging > efforts and such if you are interested. sure. I'd like to run your changes, but I cannot guarantee how much efforts by when I give work on. > > As for when bytes are fully read, see this firmware method: > > CE_completed_recv_next > > At this point, I am trying to make a work-around that will force a re-read of the ring > buffer (basically, fake an interrupt). > > > Back to the original attempt at debugging this...the problem was quite easy to reproduce > before I started adding debugging to the firmware..and the debugging I have added is quite light on > run-time behaviour, so I suspect some sort of race either in software or hardware. > > Hard to pin it down though. > > Out of curiosity, are you aware of anyone hitting this type of problem with upstream > firmware? sorry, but I don't see people address this issue. > Thanks, > Ben > > Regards, Peter _______________________________________________ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k