From mboxrd@z Thu Jan 1 00:00:00 1970 Return-path: Received: from mail2.candelatech.com ([208.74.158.173]) by bombadil.infradead.org with esmtp (Exim 4.80.1 #2 (Red Hat Linux)) id 1YsFxj-0000fC-1I for ath10k@lists.infradead.org; Tue, 12 May 2015 19:35:34 +0000 Message-ID: <55525666.9060900@candelatech.com> Date: Tue, 12 May 2015 12:37:10 -0700 From: Ben Greear MIME-Version: 1.0 Subject: Re: ath10k: freeze after disconnection on killer1525 References: <5550A5A4.8080908@gmx.com> <55511C61.9010907@candelatech.com> In-Reply-To: List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Sender: "ath10k" Errors-To: ath10k-bounces+kvalo=adurom.com@lists.infradead.org To: Michal Kazior Cc: Gabriele Martino , "ath10k@lists.infradead.org" On 05/11/2015 09:52 PM, Michal Kazior wrote: > On 11 May 2015 at 23:17, Ben Greear wrote: >> On 05/11/2015 06:30 AM, Michal Kazior wrote: >>> On 11 May 2015 at 14:50, Gabriele Martino wrote: >>>> Hi, >>>> I'm using a Killer 1525 with hw2.1 firmware, and sometimes it stop working. >>>> I can get it working again disconnecting and reconnecting, but sometimes >>>> on disconnection it freezes for a long time: >>>> >>>> [ 2740.035190] dmar: DRHD: handling fault status reg 2 >>>> [ 2740.035195] dmar: DMAR:[DMA Read] Request device [03:00.0] fault addr >>>> ffbeb000 >>>> DMAR:[fault reason 06] PTE Read access is not >>>> set >>> >>> This looks like DMA tx pool memory address. I suspect >>> firmware/hardware tried to access memory which was already unmapped by >>> ath10k. >>> >>> If you're feeling lucky you could disable IOMMU - this should prevent >>> from crashing and disconnecting. However this is hardly a solution >>> unless you're okay with the device reading random memory and doing >>> *stuff* with it (plaintext password from RAM sent on the air, anyone? >>> :-) >> >> I don't actually see a firmware crash here. This looks a bit like the problem >> I hit where the WMI transport basically hangs, but the firmware does not actually >> crash. (I don't remember seeing any DMAR issues in my case, not sure if >> that is significant or not.) > > Firmware won't necessarily crash. I guess it depends on IOMMU > controller whether the device will actually crash per se and qca6174 > is a little more forgiving against faulted host memory access. qca988x > tends to just crash outright if it gets a DMAR fault. So the FW is just wedged in this case, and will not crash nor actually handle commands properly? That sounds like the worst of any possible combination! I guess one would need to hack ath10k to detect the repeated WMI timeouts and then attempt to restart the NIC? >> I added some keep-alive messages, busy polling, and firmware watchdog logic >> to my kernel and firmware that seem to have effectively worked around >> this problem. >> >> My kernels also have work-arounds for the hangs (FW watchdog will kill truly hung >> firmware in about 5 seconds and then system should recover normally). >> >> Gabriele: If you want to try my 3.17 kernel and CT firmware I'm curious to >> see logs if you see similar problems. > > He's using qca6174, not qca988x. Your firmware does not apply in this case. Ahh, my bad. Thanks for clarifying. Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com _______________________________________________ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k