linux-wireless.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* system hang with backports-20150511/20150525
@ 2015-06-01  1:37 Marty Faltesek
  2015-06-01  6:36 ` Michal Kazior
  0 siblings, 1 reply; 6+ messages in thread
From: Marty Faltesek @ 2015-06-01  1:37 UTC (permalink / raw)
  To: linux-wireless; +Cc: Martin Faltesek

Starting with backports-20150511, and continuing with
backports-20150525, we see frequent system hangs. backports-20150424
had no issue.

After the freeze, the console is non-responsive, as well as the
network stack (ssh/ping does not work). Using sysrq, I can see log
messages continuing from ath10k_pci after the freeze, along with some
other threads as well.

mac80211/ath10k/cfg80211 are the only modules in use from backports,
so it seems like a deadlock  could possibly be with mac80211 or
ath10k.

LOCKDEP didn't reveal anything.

Using a 3.2.26 kernel on ARM. AP mode. No encryption.

I've collected ftrace events for sched mac80211 net napi cfg80211
workqueue, which are included in the dmesg you can find here because
of its size:

http://tinyurl.com/dmesg-ftrace

In the logs, the last timestamp that my test script wrote is:

[ 1021.291495] hbeat0352

I've captured  ftrace events before and after 1021.291495.

Thanks,
Marty Faltesek
Google Fiber

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: system hang with backports-20150511/20150525
  2015-06-01  1:37 system hang with backports-20150511/20150525 Marty Faltesek
@ 2015-06-01  6:36 ` Michal Kazior
  2015-06-01  7:13   ` Kalle Valo
  0 siblings, 1 reply; 6+ messages in thread
From: Michal Kazior @ 2015-06-01  6:36 UTC (permalink / raw)
  To: Marty Faltesek
  Cc: linux-wireless, Martin Faltesek, ath10k@lists.infradead.org

+ath10k list

On 1 June 2015 at 03:37, Marty Faltesek <mfaltesek@google.com> wrote:
> Starting with backports-20150511, and continuing with
> backports-20150525, we see frequent system hangs. backports-20150424
> had no issue.

I don't see such binary releases on
https://backports.wiki.kernel.org/index.php/Main_Page
Hence I don't know what kernel you've backported the drivers from and
I can't compare anything.

Can you provide more details, please?


> After the freeze, the console is non-responsive, as well as the
> network stack (ssh/ping does not work). Using sysrq, I can see log
> messages continuing from ath10k_pci after the freeze, along with some
> other threads as well.

You probably refer to:

[ 1026.951643] ath10k_pci 0000:01:00.0: SWBA overrun on vdev 0,
skipped old beacon
[ 1026.951674] ath10k_pci 0000:01:00.0: SWBA overrun on vdev 0,
skipped old beacon
[ 1026.951698] ath10k_pci 0000:01:00.0: SWBA overrun on vdev 0,
skipped old beacon

What's puzzling to me are these timestamps. SWBA events are generated
by firmware (and sent to host) every beacon interval which is ~100ms
in most cases. In your case however I can see a burst of at least 10
SWBA events within 1ms. Either top(irq) or bottom(tasklet) got stuck
for some time.

It could be useful if you could enable ath10k debugging with
debug_mask=0xffffff3f (this could generate a lot of messages if you're
running traffic through ath10k).


> mac80211/ath10k/cfg80211 are the only modules in use from backports,
> so it seems like a deadlock  could possibly be with mac80211 or
> ath10k.
>
> LOCKDEP didn't reveal anything.

You might want to try tune /proc/sys/kernel/hung_task_timeout_secs
down (e.g. 5 or 10 seconds) and see what happens when you hit the
problem.


> Using a 3.2.26 kernel on ARM. AP mode. No encryption.
>
> I've collected ftrace events for sched mac80211 net napi cfg80211
> workqueue, which are included in the dmesg you can find here because
> of its size:
>
> http://tinyurl.com/dmesg-ftrace
>
> In the logs, the last timestamp that my test script wrote is:
>
> [ 1021.291495] hbeat0352
>
> I've captured  ftrace events before and after 1021.291495.

Your dmesg looks really messy and I'm worried if SWBA events really
came in a burst or not.


Michał

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: system hang with backports-20150511/20150525
  2015-06-01  6:36 ` Michal Kazior
@ 2015-06-01  7:13   ` Kalle Valo
  2015-06-01  8:27     ` Michal Kazior
  0 siblings, 1 reply; 6+ messages in thread
From: Kalle Valo @ 2015-06-01  7:13 UTC (permalink / raw)
  To: Michal Kazior
  Cc: Marty Faltesek, Martin Faltesek, linux-wireless,
	ath10k@lists.infradead.org

Michal Kazior <michal.kazior@tieto.com> writes:

> +ath10k list
>
> On 1 June 2015 at 03:37, Marty Faltesek <mfaltesek@google.com> wrote:
>> Starting with backports-20150511, and continuing with
>> backports-20150525, we see frequent system hangs. backports-20150424
>> had no issue.
>
> I don't see such binary releases on
> https://backports.wiki.kernel.org/index.php/Main_Page
> Hence I don't know what kernel you've backported the drivers from and
> I can't compare anything.
>
> Can you provide more details, please?

I suspect it's from here:

https://www.kernel.org/pub/linux/kernel/projects/backports/2015/05/25/

The backports project pages are a bit confusing and that location is
hard to find.

-- 
Kalle Valo

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: system hang with backports-20150511/20150525
  2015-06-01  7:13   ` Kalle Valo
@ 2015-06-01  8:27     ` Michal Kazior
  2015-06-01 19:42       ` Marty Faltesek
  0 siblings, 1 reply; 6+ messages in thread
From: Michal Kazior @ 2015-06-01  8:27 UTC (permalink / raw)
  To: Kalle Valo
  Cc: Marty Faltesek, Martin Faltesek, linux-wireless,
	ath10k@lists.infradead.org

On 1 June 2015 at 09:13, Kalle Valo <kvalo@qca.qualcomm.com> wrote:
> Michal Kazior <michal.kazior@tieto.com> writes:
>
>> +ath10k list
>>
>> On 1 June 2015 at 03:37, Marty Faltesek <mfaltesek@google.com> wrote:
>>> Starting with backports-20150511, and continuing with
>>> backports-20150525, we see frequent system hangs. backports-20150424
>>> had no issue.
>>
>> I don't see such binary releases on
>> https://backports.wiki.kernel.org/index.php/Main_Page
>> Hence I don't know what kernel you've backported the drivers from and
>> I can't compare anything.
>>
>> Can you provide more details, please?
>
> I suspect it's from here:
>
> https://www.kernel.org/pub/linux/kernel/projects/backports/2015/05/25/
>
> The backports project pages are a bit confusing and that location is
> hard to find.

Oh, thanks!

Hmm.. There was a ton of changes between 20150424 and 20150511. For
one, ath10k started to use chanctx API and FAST_XMIT. But it's not a
given these two are to blame.

The latter can be easily disabled by removing
IEEE80211_HW_SUPPORT_FAST_XMIT from ar->hw->flags in ath10k's mac.c.
The former.. not so easy. I'd be awesome if you could do a git bisect.
The commit ids are a3da0fb6(good) f17107c(bad) (you need linux-next
git repo including its tags for these ids to be resolvable).


Michał

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: system hang with backports-20150511/20150525
  2015-06-01  8:27     ` Michal Kazior
@ 2015-06-01 19:42       ` Marty Faltesek
  2015-06-02  5:20         ` Michal Kazior
  0 siblings, 1 reply; 6+ messages in thread
From: Marty Faltesek @ 2015-06-01 19:42 UTC (permalink / raw)
  To: Michal Kazior
  Cc: Kalle Valo, Martin Faltesek, linux-wireless,
	ath10k@lists.infradead.org

I disabled IEEE80211_HW_SUPPORT_FAST_XMIT before, and still saw the
hang. I repeated today to confirm.

I added the extra ath10k debug flags you requested, and it causes a
system reset without any messages, very soon after the last hbeat
timestamp. I've uploaded  log "crash.6.1.15.13.46" to
http://tinyurl.com/dmesg-ftrace.

Any advice on how to bisect when using backports?

On Mon, Jun 1, 2015 at 4:27 AM, Michal Kazior <michal.kazior@tieto.com> wrote:
> On 1 June 2015 at 09:13, Kalle Valo <kvalo@qca.qualcomm.com> wrote:
>> Michal Kazior <michal.kazior@tieto.com> writes:
>>
>>> +ath10k list
>>>
>>> On 1 June 2015 at 03:37, Marty Faltesek <mfaltesek@google.com> wrote:
>>>> Starting with backports-20150511, and continuing with
>>>> backports-20150525, we see frequent system hangs. backports-20150424
>>>> had no issue.
>>>
>>> I don't see such binary releases on
>>> https://backports.wiki.kernel.org/index.php/Main_Page
>>> Hence I don't know what kernel you've backported the drivers from and
>>> I can't compare anything.
>>>
>>> Can you provide more details, please?
>>
>> I suspect it's from here:
>>
>> https://www.kernel.org/pub/linux/kernel/projects/backports/2015/05/25/
>>
>> The backports project pages are a bit confusing and that location is
>> hard to find.
>
> Oh, thanks!
>
> Hmm.. There was a ton of changes between 20150424 and 20150511. For
> one, ath10k started to use chanctx API and FAST_XMIT. But it's not a
> given these two are to blame.
>
> The latter can be easily disabled by removing
> IEEE80211_HW_SUPPORT_FAST_XMIT from ar->hw->flags in ath10k's mac.c.
> The former.. not so easy. I'd be awesome if you could do a git bisect.
> The commit ids are a3da0fb6(good) f17107c(bad) (you need linux-next
> git repo including its tags for these ids to be resolvable).
>
>
> Michał

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: system hang with backports-20150511/20150525
  2015-06-01 19:42       ` Marty Faltesek
@ 2015-06-02  5:20         ` Michal Kazior
  0 siblings, 0 replies; 6+ messages in thread
From: Michal Kazior @ 2015-06-02  5:20 UTC (permalink / raw)
  To: Marty Faltesek
  Cc: Kalle Valo, Martin Faltesek, linux-wireless,
	ath10k@lists.infradead.org

On 1 June 2015 at 21:42, Marty Faltesek <mfaltesek@google.com> wrote:
> I disabled IEEE80211_HW_SUPPORT_FAST_XMIT before, and still saw the
> hang. I repeated today to confirm.

Thanks for checking.


> I added the extra ath10k debug flags you requested, and it causes a
> system reset without any messages, very soon after the last hbeat
> timestamp. I've uploaded  log "crash.6.1.15.13.46" to
> http://tinyurl.com/dmesg-ftrace.

I guess serial console gave up which isn't really surprising :( Thanks
for checking anyway.


> Any advice on how to bisect when using backports?

Sure. Generally you'll need to do the bisect on your linux-next tree
which you use to generate backports:

 git clone git://git.kernel.org/pub/scm/linux/kernel/git/backports/backports.git
 git clone git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
 cd linux-next
 git bisect start
 git bisect good a3da0fb6
 git bisect bad f17107c
 # repeat:
 cd ../backports
 ./gentree.py --clean --git-revision HEAD ../linux-next ../backports-output/
 cd ../backports-output
 # configure, make, test
 cd ../linux-next
 git bisect <good|bad> # "good" if problem doesn't reproduce, "bad" if it does
 # goto repeat


Michał




>
> On Mon, Jun 1, 2015 at 4:27 AM, Michal Kazior <michal.kazior@tieto.com> wrote:
>> On 1 June 2015 at 09:13, Kalle Valo <kvalo@qca.qualcomm.com> wrote:
>>> Michal Kazior <michal.kazior@tieto.com> writes:
>>>
>>>> +ath10k list
>>>>
>>>> On 1 June 2015 at 03:37, Marty Faltesek <mfaltesek@google.com> wrote:
>>>>> Starting with backports-20150511, and continuing with
>>>>> backports-20150525, we see frequent system hangs. backports-20150424
>>>>> had no issue.
>>>>
>>>> I don't see such binary releases on
>>>> https://backports.wiki.kernel.org/index.php/Main_Page
>>>> Hence I don't know what kernel you've backported the drivers from and
>>>> I can't compare anything.
>>>>
>>>> Can you provide more details, please?
>>>
>>> I suspect it's from here:
>>>
>>> https://www.kernel.org/pub/linux/kernel/projects/backports/2015/05/25/
>>>
>>> The backports project pages are a bit confusing and that location is
>>> hard to find.
>>
>> Oh, thanks!
>>
>> Hmm.. There was a ton of changes between 20150424 and 20150511. For
>> one, ath10k started to use chanctx API and FAST_XMIT. But it's not a
>> given these two are to blame.
>>
>> The latter can be easily disabled by removing
>> IEEE80211_HW_SUPPORT_FAST_XMIT from ar->hw->flags in ath10k's mac.c.
>> The former.. not so easy. I'd be awesome if you could do a git bisect.
>> The commit ids are a3da0fb6(good) f17107c(bad) (you need linux-next
>> git repo including its tags for these ids to be resolvable).
>>
>>
>> Michał

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2015-06-02  5:20 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-06-01  1:37 system hang with backports-20150511/20150525 Marty Faltesek
2015-06-01  6:36 ` Michal Kazior
2015-06-01  7:13   ` Kalle Valo
2015-06-01  8:27     ` Michal Kazior
2015-06-01 19:42       ` Marty Faltesek
2015-06-02  5:20         ` Michal Kazior

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).