linux-wireless.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ben Greear <greearb@candelatech.com>
To: Michal Kazior <michal.kazior@tieto.com>
Cc: "ath10k@lists.infradead.org" <ath10k@lists.infradead.org>,
	linux-wireless <linux-wireless@vger.kernel.org>
Subject: Re: [RFTv2 0/5] ath10k: ath10k: fix flushing and tx stalls
Date: Wed, 09 Apr 2014 22:26:42 -0700	[thread overview]
Message-ID: <53462B92.7010408@candelatech.com> (raw)
In-Reply-To: <CA+BoTQnorw7Z8ve9pvGY-RgZP2N+RWNd14Xa_yxM3q1+7ryNfw@mail.gmail.com>

On 04/09/2014 10:10 PM, Michal Kazior wrote:
> On 10 April 2014 01:58, Ben Greear <greearb@candelatech.com> wrote:
>> On 04/09/2014 02:46 PM, Ben Greear wrote:
>>> Here's another log snippet with these 5 patches (and lots more
>>> mostly non ath10k patches of my own) applied:
>>
>> And another one, this time with more debugging enabled.
>> The 0x7110XXXX numbers indicate the command-id (the XXXX part
>> is the cmd id).
>>
>> After this below, I see a debug-log message come from
>> the firmware, and then nothing else.  I had added a sort
>> of keep-alive message in the firmware, and I do not see that
>> in my logs, so probably firmware is wedged in such a way that
>> it cannot or will not send packets to the host at this point.
>>
>> I had chased this sort of problem previously, and ended up
>> with a hack to reset firmware when the flush failed twice.
>> I backed that out when applying your patches, but I guess
>> it is still needed.
>
> Then this looks like a different issue from what I've been trying to
> fix actually.
>
> In my case when acting as AP it's possible to get WMI mgmt tx frames
> stuck in FW queues when sleeping client stops responding for about 10
> seconds. If you use up all tx credits (the multitude of 2 that there
> are :-) beaconing stops and everything just fails.

I see them stuck for a long while in STA mode too, but usually
it recovers.  I tried finding where in firmware I could increase
the tx-credits to 4 or something, but had no luck...tis a twisty
maze.


>> ath10k: ep 2 got 1 credits tot 2
>> ath10k: mac vdev 20 start 04:f0:21:03:38:99
>> ath10k: mac vdev 20 start center_freq 5180 phymode 11ac-vht80
>
>> ath10k: ep 2 used 1 credits, remaining 1 dbg 1896910867 (0x71109013)
>
> I suppose this print is located in ath10k_htc_send()?

Yes, right after decrementing credits.

>> ath10k: ep 2 got 1 credits tot 2
>> sta219: send auth to 04:f0:21:03:38:99 (try 1/3) at: 1397086238.721985
>> ath10k: ep 2 used 1 credits, remaining 1 dbg 1896910888 (0x71109028)
>> ath10k: mac flushing peer 04:f0:21:03:38:99 on vdev 20 mgmt tid for unicast mgmt (204 msecs)
>> ath10k: ep 2 used 1 credits, remaining 0 dbg 1896910878 (0x7110901e)
>> ath10k: Creating vdev id: 22  map: 12582912
>> ath10k: mac vdev create 22 (add interface) type 2 subtype 0
>> sta219: send auth to 04:f0:21:03:38:99 (try 2/3) at: 1397086239.28088
>> [firmware logging msg]
>> ath10k: failed to create WMI vdev 22: -11
>
> Hmm.. If I read this correctly it means that MGMT_TX and
> PEER_FLUSH_TIDS commands are both stuck in firmware. This most likely
> means firmware stops processing everything altogether. Having HTC
> debug prints from ath10k_htc_notify_tx_completion() could provide more
> insight perhaps. I suspect MGMT_TX is the trigger in all cases.
>
> I'm still suspicious of your firmware changes. You connect multiple
> stations to the exact same AP. Is peer mapping working correctly? Are
> tid queues mapped correctly in all cases? Perhaps there's some kind of
> inconsistency that leads to this mess? I think firmware wasn't
> originally designed to support your usecase. Or maybe firmware just
> breaks when you try to run a hundred or so of vdevs :-D

I have at least attempted to rectify all of that, but indeed this
particular lockup seems like a firmware issue.  I personally suspect
that I just find many bugs 32 times faster than simpler systems will :P

The firmware has it's own sort of tx-to-host-credits logic, so if it runs
out of space it might not be able to send any messages back to
the host.  I've crawled through a lot of that code and didn't
see any obvious ways to leak buffers, but it's far from simple
code, so I could still be wrong.

Maybe I could add a small scratch area in firmware memory and place debug
info there and read it from host over the PCI bus like when we
dump the crash info...  This time of night I really hate firmware :P

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


  reply	other threads:[~2014-04-10  5:27 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-04-04 11:37 [RFT 0/4] ath10k: fix flushing and tx stalls Michal Kazior
2014-04-04 11:37 ` [RFT 1/4] ath10k: fix wmi-htc tx credit starvation Michal Kazior
2014-04-04 11:37 ` [RFT 2/4] ath10k: rework peer accounting Michal Kazior
2014-04-04 11:37 ` [RFT 3/4] ath10k: wait for mgmt tx when flushing too Michal Kazior
2014-04-04 11:37 ` [RFT 4/4] ath10k: improve tx flushing Michal Kazior
2014-04-08  6:58   ` Kalle Valo
2014-04-04 14:49 ` [RFT 0/4] ath10k: fix flushing and tx stalls Ben Greear
2014-04-04 18:31   ` Dave Taht
2014-04-07  9:06   ` Michal Kazior
2014-04-07  0:30 ` Ben Greear
2014-04-07  1:05   ` Ben Greear
2014-04-07  9:11   ` Michal Kazior
2014-04-08  2:31     ` Ben Greear
2014-04-08  5:51       ` Michal Kazior
2014-04-08 16:02         ` Ben Greear
2014-04-09  6:25           ` Michal Kazior
2014-04-09 17:34             ` Ben Greear
2014-04-09 19:29               ` Ben Greear
2014-04-10  3:45               ` Kalle Valo
2014-04-09 10:48 ` [RFTv2 0/5] ath10k: " Michal Kazior
2014-04-09 10:48   ` [RFTv2 1/5] ath10k: always request htc tx replenishment Michal Kazior
2014-04-09 10:48   ` [RFTv2 2/5] ath10k: fix wmi-htc tx credit starvation Michal Kazior
2015-01-29  1:32     ` YanBo
2015-01-29  7:57       ` Michal Kazior
2015-01-29 16:50         ` Ben Greear
2014-04-09 10:48   ` [RFTv2 3/5] ath10k: rework peer accounting Michal Kazior
2014-04-10  6:50     ` Kalle Valo
2014-04-10  6:56       ` Michal Kazior
2014-04-10  6:59     ` Kalle Valo
2014-04-10  7:11       ` Michal Kazior
2014-04-10  7:18         ` Kalle Valo
2014-04-10  7:43           ` Michal Kazior
2014-04-11  6:22             ` Kalle Valo
2014-04-11  6:31         ` Kalle Valo
2014-04-11  4:59     ` Ben Greear
2014-04-09 10:48   ` [RFTv2 4/5] ath10k: wait for mgmt tx when flushing too Michal Kazior
2014-04-09 10:48   ` [RFTv2 5/5] ath10k: improve tx flushing Michal Kazior
2014-04-09 21:46   ` [RFTv2 0/5] ath10k: ath10k: fix flushing and tx stalls Ben Greear
2014-04-09 23:58     ` Ben Greear
2014-04-10  5:10       ` Michal Kazior
2014-04-10  5:26         ` Ben Greear [this message]
2014-04-10  8:50           ` Michal Kazior

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53462B92.7010408@candelatech.com \
    --to=greearb@candelatech.com \
    --cc=ath10k@lists.infradead.org \
    --cc=linux-wireless@vger.kernel.org \
    --cc=michal.kazior@tieto.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).