* More issues with ath10k_flush @ 2014-06-05 18:47 Ben Greear 2014-06-05 23:37 ` Ben Greear 0 siblings, 1 reply; 4+ messages in thread From: Ben Greear @ 2014-06-05 18:47 UTC (permalink / raw) To: ath10k I'm back to debugging this charmer. Currently I see the flush fail (and take 5 seconds doing so) fairly often when creating lots of station vifs against my firmware. Once stations are connected, there are usually no more timeouts, even though I might be sending/receiving 100+Mbps of traffic for hours at a time. By printing out the firmware stats, I see that much of the time the hardware has accepted X packets for transmission, but has completed X-1. It is possible the firmware's counters are screwed up some how or that it lost a packet, but I think it may also be possible that the firmware is just being really slow about completing a packet every now and then. I have looked at the firmware in detail and have found no way that it could actually leak tx descriptors. So, I was thinking about changing the flush logic to try the current flush (that just waits) for up to 1/5 of the flush timeout, and if that fails, try telling the firmware to purge it's tx buffers, and then wait up to 4/5ths more of the flush timeout. Does that sound like a reasonable approach? Currently, my work-around is just to restart firmware after it fails to flush for 2 tries in a row, seems like there could be something better! Thanks, Ben -- Ben Greear <greearb@candelatech.com> Candela Technologies Inc http://www.candelatech.com _______________________________________________ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: More issues with ath10k_flush 2014-06-05 18:47 More issues with ath10k_flush Ben Greear @ 2014-06-05 23:37 ` Ben Greear 2014-06-06 5:16 ` Michal Kazior 0 siblings, 1 reply; 4+ messages in thread From: Ben Greear @ 2014-06-05 23:37 UTC (permalink / raw) To: ath10k On 06/05/2014 11:47 AM, Ben Greear wrote: > I'm back to debugging this charmer. > > Currently I see the flush fail (and take 5 seconds doing so) > fairly often when creating lots of station vifs against my firmware. > > Once stations are connected, there are usually no more timeouts, > even though I might be sending/receiving 100+Mbps of traffic for hours at > a time. > > By printing out the firmware stats, I see that much of the time > the hardware has accepted X packets for transmission, but has completed > X-1. It is possible the firmware's counters are screwed up some how > or that it lost a packet, but I think it may also be possible that > the firmware is just being really slow about completing a packet > every now and then. I have looked at the firmware in detail and > have found no way that it could actually leak tx descriptors. > > So, I was thinking about changing the flush logic to try > the current flush (that just waits) for up to 1/5 of the > flush timeout, and if that fails, try telling the firmware to purge > it's tx buffers, and then wait up to 4/5ths more of the > flush timeout. After poking around, it seems there is no wmi command to tell the firmware to just flush everything, so I hacked one into my firmware, called it before ath10k_flush starts waiting, and after several reboots, I do not see any timeouts trying to flush. So, maybe that will do the trick...other suggestions are still welcome :) Ben > > Does that sound like a reasonable approach? > > Currently, my work-around is just to restart firmware > after it fails to flush for 2 tries in a row, seems > like there could be something better! > > Thanks, > Ben > -- Ben Greear <greearb@candelatech.com> Candela Technologies Inc http://www.candelatech.com _______________________________________________ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: More issues with ath10k_flush 2014-06-05 23:37 ` Ben Greear @ 2014-06-06 5:16 ` Michal Kazior 2014-06-06 14:49 ` Ben Greear 0 siblings, 1 reply; 4+ messages in thread From: Michal Kazior @ 2014-06-06 5:16 UTC (permalink / raw) To: Ben Greear; +Cc: ath10k On 6 June 2014 01:37, Ben Greear <greearb@candelatech.com> wrote: > On 06/05/2014 11:47 AM, Ben Greear wrote: >> I'm back to debugging this charmer. >> >> Currently I see the flush fail (and take 5 seconds doing so) >> fairly often when creating lots of station vifs against my firmware. >> >> Once stations are connected, there are usually no more timeouts, >> even though I might be sending/receiving 100+Mbps of traffic for hours at >> a time. >> >> By printing out the firmware stats, I see that much of the time >> the hardware has accepted X packets for transmission, but has completed >> X-1. It is possible the firmware's counters are screwed up some how >> or that it lost a packet, but I think it may also be possible that >> the firmware is just being really slow about completing a packet >> every now and then. I have looked at the firmware in detail and >> have found no way that it could actually leak tx descriptors. Interesting. This reminds me the lazy wmi-htc tx credit replenishment after wmi mgmt tx is completed. Maybe it's a similar sort of thing? Maybe it's actually completed but for some reason the completion hasn't been fully processed yet.. >> So, I was thinking about changing the flush logic to try >> the current flush (that just waits) for up to 1/5 of the >> flush timeout, and if that fails, try telling the firmware to purge >> it's tx buffers, and then wait up to 4/5ths more of the >> flush timeout. Sounds reasonable. > After poking around, it seems there is no wmi command to tell > the firmware to just flush everything, so I hacked one into > my firmware, called it before ath10k_flush starts waiting, > and after several reboots, I do not see any timeouts trying > to flush. I thought WMI_PEER_FLUSH_TIDS_CMDID is for that. It didn't work for you? If so I would assume it's a firmware bug.. > So, maybe that will do the trick...other suggestions are > still welcome :) Did you try to find out what kind of frame is supposedly held? I recall you've posted a NullFunc hexdump once pointing that it's one of the offending frames that didn't complete. So.. maybe just not sending NullFunc frames (hell, they don't get a proper ack status anyway..) or somehow altering how they are sent is another way to work this around. Michał _______________________________________________ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: More issues with ath10k_flush 2014-06-06 5:16 ` Michal Kazior @ 2014-06-06 14:49 ` Ben Greear 0 siblings, 0 replies; 4+ messages in thread From: Ben Greear @ 2014-06-06 14:49 UTC (permalink / raw) To: Michal Kazior; +Cc: ath10k On 06/05/2014 10:16 PM, Michal Kazior wrote: > On 6 June 2014 01:37, Ben Greear <greearb@candelatech.com> wrote: >> On 06/05/2014 11:47 AM, Ben Greear wrote: >>> I'm back to debugging this charmer. >>> >>> Currently I see the flush fail (and take 5 seconds doing so) >>> fairly often when creating lots of station vifs against my firmware. >>> >>> Once stations are connected, there are usually no more timeouts, >>> even though I might be sending/receiving 100+Mbps of traffic for hours at >>> a time. >>> >>> By printing out the firmware stats, I see that much of the time >>> the hardware has accepted X packets for transmission, but has completed >>> X-1. It is possible the firmware's counters are screwed up some how >>> or that it lost a packet, but I think it may also be possible that >>> the firmware is just being really slow about completing a packet >>> every now and then. I have looked at the firmware in detail and >>> have found no way that it could actually leak tx descriptors. > > Interesting. This reminds me the lazy wmi-htc tx credit replenishment > after wmi mgmt tx is completed. Maybe it's a similar sort of thing? > Maybe it's actually completed but for some reason the completion > hasn't been fully processed yet.. I didn't see any reason for that to happen in the firmware, but it is not the simplest code... >>> So, I was thinking about changing the flush logic to try >>> the current flush (that just waits) for up to 1/5 of the >>> flush timeout, and if that fails, try telling the firmware to purge >>> it's tx buffers, and then wait up to 4/5ths more of the >>> flush timeout. > > Sounds reasonable. By flushing before we start waiting, maybe we don't need the extra cleverness...but possibly it would be better to wait a short bit of time an then flush firmware if we still have pending skbs? >> After poking around, it seems there is no wmi command to tell >> the firmware to just flush everything, so I hacked one into >> my firmware, called it before ath10k_flush starts waiting, >> and after several reboots, I do not see any timeouts trying >> to flush. > > I thought WMI_PEER_FLUSH_TIDS_CMDID is for that. It didn't work for > you? If so I would assume it's a firmware bug.. Well, actually, the command may have worked...but instead of iterating through all peers for all vdevs and making lots of wmi calls, I just made the firmware do the iteration by passing 0xFFFFFFFF as the vdev-id and special-casing the firmware handling of the message. Was only about 8 extra lines of code in the firmware... I also noticed something where the firmware might not be flushing it's tids when a vdev goes down...I didn't bother to change that yet, but possibly that is part of the issue. (It only flushed if vdev was 'paused'...not sure why.) >> So, maybe that will do the trick...other suggestions are >> still welcome :) > > Did you try to find out what kind of frame is supposedly held? I > recall you've posted a NullFunc hexdump once pointing that it's one of > the offending frames that didn't complete. > > So.. maybe just not sending NullFunc frames (hell, they don't get a > proper ack status anyway..) or somehow altering how they are sent is > another way to work this around. I haven't tried printing them lately...and if the flush logic continues to work, I probably won't bother... In the past, I know there were sometimes lots of larger frames as well, but possibly that was a separate issue as I have not seen more than one frame hung lately. Thanks, Ben -- Ben Greear <greearb@candelatech.com> Candela Technologies Inc http://www.candelatech.com _______________________________________________ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2014-06-06 14:49 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-06-05 18:47 More issues with ath10k_flush Ben Greear 2014-06-05 23:37 ` Ben Greear 2014-06-06 5:16 ` Michal Kazior 2014-06-06 14:49 ` Ben Greear
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox