* Frequent ath10k crashes with QCA6174 on kernels 4.16 and 4.17
@ 2018-05-29 8:52 Timur Kristóf
2018-05-29 10:21 ` Kalle Valo
0 siblings, 1 reply; 10+ messages in thread
From: Timur Kristóf @ 2018-05-29 8:52 UTC (permalink / raw)
To: ath10k
Hi,
I'm experiencing a lot of ath10k driver crashes on both 4.16 (as
shipped in Fedora) and 4.17 (tried rc5 and rc6) using a Dell XPS 13
9370 with a QCA6174.
With 4.16 the driver crash always results in the whole system hanging.
With 4.17 it does not always hang the system completely, but it still
happends. I also tried to upgrade the firmware which also reduces the
frequency of the crashes but does not eliminate them completely.
The common pattern I see in dmesg is this:
[30372.900832] wlp2s0: AP 64:7c:34:3f:c3:b0 changed bandwidth, new config is 2437 MHz, width 2 (2447/0 MHz)
Searching a bit in kernel sources, this message is printed by this line:
https://github.com/torvalds/linux/blob/master/net/mac80211/mlme.c#L364
And width 2 seems to correspond to NL80211_CHAN_WIDTH_40 here:
https://github.com/torvalds/linux/blob/master/include/uapi/linux/nl80211.h#L3880
So it seems that either the driver or the firmware can't cope with it when the AP changes the bandwidth.
Here is the Fedora bug report:
https://bugzilla.redhat.com/show_bug.cgi?id=1577106
Is there anything that can be done to improve the stability of ath10k on this system?
I see there is a lot going on in the upstream repo, but not sure which branch to try: ath-next or qca?
Best regards,
Tim
ps. I'm not on the mailing list, please add me to CC when you reply. Thanks.
_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Frequent ath10k crashes with QCA6174 on kernels 4.16 and 4.17
2018-05-29 8:52 Frequent ath10k crashes with QCA6174 on kernels 4.16 and 4.17 Timur Kristóf
@ 2018-05-29 10:21 ` Kalle Valo
2018-05-29 12:32 ` Timur Kristóf
0 siblings, 1 reply; 10+ messages in thread
From: Kalle Valo @ 2018-05-29 10:21 UTC (permalink / raw)
To: Timur Kristóf; +Cc: ath10k, ryanhsu
+ ryan
Timur Kristóf <timur.kristof@gmail.com> writes:
> I'm experiencing a lot of ath10k driver crashes on both 4.16 (as
> shipped in Fedora) and 4.17 (tried rc5 and rc6) using a Dell XPS 13
> 9370 with a QCA6174.
>
> With 4.16 the driver crash always results in the whole system hanging.
> With 4.17 it does not always hang the system completely, but it still
> happends. I also tried to upgrade the firmware which also reduces the
> frequency of the crashes but does not eliminate them completely.
>
> The common pattern I see in dmesg is this:
> [30372.900832] wlp2s0: AP 64:7c:34:3f:c3:b0 changed bandwidth, new
> config is 2437 MHz, width 2 (2447/0 MHz)
>
> Searching a bit in kernel sources, this message is printed by this line:
> https://github.com/torvalds/linux/blob/master/net/mac80211/mlme.c#L364
> And width 2 seems to correspond to NL80211_CHAN_WIDTH_40 here:
> https://github.com/torvalds/linux/blob/master/include/uapi/linux/nl80211.h#L3880
>
> So it seems that either the driver or the firmware can't cope with it
> when the AP changes the bandwidth.
> Here is the Fedora bug report:
> https://bugzilla.redhat.com/show_bug.cgi?id=1577106
>
> Is there anything that can be done to improve the stability of ath10k on this system?
> I see there is a lot going on in the upstream repo, but not sure which
> branch to try: ath-next or qca?
I think this patch should fix the firmware crash:
ath10k: Update the phymode along with bandwidth change request
https://patchwork.kernel.org/patch/10183453/
Unfortunately it didn't apply to my tree so I had to drop it. Few days
ago I was talking with Ryan and he said he is planning to send v2, but
I'm sure he don't mind if someone else can send the v2 either. The
faster we can apply the patch, and get it to stable releases, the better
as we seems to be a very common issue.
But there also seems to be another bug related to firmware restart, it
shouldn't crash the kernel! Does the kernel crash also when crashing the
kernel using simulate_fw_crash debugfs file?
--
Kalle Valo
_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Frequent ath10k crashes with QCA6174 on kernels 4.16 and 4.17
2018-05-29 10:21 ` Kalle Valo
@ 2018-05-29 12:32 ` Timur Kristóf
2018-05-29 13:29 ` Kalle Valo
0 siblings, 1 reply; 10+ messages in thread
From: Timur Kristóf @ 2018-05-29 12:32 UTC (permalink / raw)
To: Kalle Valo; +Cc: ath10k, ryanhsu
On Tue, 2018-05-29 at 13:21 +0300, Kalle Valo wrote:
> > So it seems that either the driver or the firmware can't cope with
> > it
> > when the AP changes the bandwidth.
> > Here is the Fedora bug report:
> > https://bugzilla.redhat.com/show_bug.cgi?id=1577106
> >
> > Is there anything that can be done to improve the stability of
> > ath10k on this system?
> > I see there is a lot going on in the upstream repo, but not sure
> > which
> > branch to try: ath-next or qca?
>
> I think this patch should fix the firmware crash:
>
> ath10k: Update the phymode along with bandwidth change request
>
> https://patchwork.kernel.org/patch/10183453/
>
> Unfortunately it didn't apply to my tree so I had to drop it. Few
> days
> ago I was talking with Ryan and he said he is planning to send v2,
> but
> I'm sure he don't mind if someone else can send the v2 either. The
> faster we can apply the patch, and get it to stable releases, the
> better
> as we seems to be a very common issue.
I don't have much experience with this, but will take a look and let
you know how that goes.
> But there also seems to be another bug related to firmware restart,
> it
> shouldn't crash the kernel! Does the kernel crash also when crashing
> the
> kernel using simulate_fw_crash debugfs file?
Yes, tried it just now. The driver can survive 2 simulated hard
crashes. After the 3rd hard crash, it reproduces the same issue that I
described in the bugreport: Wifi stops working. I can turn it off. When
I turn it on again, the whole system hangs completely.
You would expect that a single driver crash should not affect the whole
system like this, but it does.
Is there a chance this is already fixed upstream? In general, which of
your branches should I try to use with this device?
ath-next or ath-qca?
Thanks & best regards,
Timur
_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Frequent ath10k crashes with QCA6174 on kernels 4.16 and 4.17
2018-05-29 12:32 ` Timur Kristóf
@ 2018-05-29 13:29 ` Kalle Valo
2018-05-29 19:36 ` Timur Kristóf
0 siblings, 1 reply; 10+ messages in thread
From: Kalle Valo @ 2018-05-29 13:29 UTC (permalink / raw)
To: Timur Kristóf; +Cc: ath10k, ryanhsu
Timur Kristóf <timur.kristof@gmail.com> writes:
> On Tue, 2018-05-29 at 13:21 +0300, Kalle Valo wrote:
>> > So it seems that either the driver or the firmware can't cope with
>> > it
>> > when the AP changes the bandwidth.
>> > Here is the Fedora bug report:
>> > https://bugzilla.redhat.com/show_bug.cgi?id=1577106
>> >
>> > Is there anything that can be done to improve the stability of
>> > ath10k on this system?
>> > I see there is a lot going on in the upstream repo, but not sure
>> > which
>> > branch to try: ath-next or qca?
>>
>> I think this patch should fix the firmware crash:
>>
>> ath10k: Update the phymode along with bandwidth change request
>>
>> https://patchwork.kernel.org/patch/10183453/
>>
>> Unfortunately it didn't apply to my tree so I had to drop it. Few
>> days ago I was talking with Ryan and he said he is planning to send
>> v2, but I'm sure he don't mind if someone else can send the v2
>> either. The faster we can apply the patch, and get it to stable
>> releases, the better as we seems to be a very common issue.
>
> I don't have much experience with this, but will take a look and let
> you know how that goes.
Ok, let me know how it goes. Some info and pointers here:
https://wireless.wiki.kernel.org/en/users/drivers/ath10k/submittingpatches
>> But there also seems to be another bug related to firmware restart,
>> it
>> shouldn't crash the kernel! Does the kernel crash also when crashing
>> the
>> kernel using simulate_fw_crash debugfs file?
>
> Yes, tried it just now. The driver can survive 2 simulated hard
> crashes. After the 3rd hard crash, it reproduces the same issue that I
> described in the bugreport: Wifi stops working. I can turn it off. When
> I turn it on again, the whole system hangs completely.
>
> You would expect that a single driver crash should not affect the whole
> system like this, but it does.
Yeah, it definitely should not crash like that. I wonder what has broken
it.
> Is there a chance this is already fixed upstream? In general, which of
> your branches should I try to use with this device? ath-next or
> ath-qca?
I doubt neither of the bugs (the firmware crash and the kernel hang) are
fixed, at least I don't recall seeing any fixes. But if you have time to
test, use the master branch of my ath.git tree for any testing.
https://wireless.wiki.kernel.org/en/users/drivers/ath10k/sources
--
Kalle Valo
_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Frequent ath10k crashes with QCA6174 on kernels 4.16 and 4.17
2018-05-29 13:29 ` Kalle Valo
@ 2018-05-29 19:36 ` Timur Kristóf
2018-05-30 7:15 ` Timur Kristóf
0 siblings, 1 reply; 10+ messages in thread
From: Timur Kristóf @ 2018-05-29 19:36 UTC (permalink / raw)
To: Kalle Valo; +Cc: ath10k, ryanhsu
On Tue, 2018-05-29 at 16:29 +0300, Kalle Valo wrote:
> Timur Kristóf <timur.kristof@gmail.com> writes:
> > I don't have much experience with this, but will take a look and let
> > you know how that goes.
>
> Ok, let me know how it goes. Some info and pointers here:
> https://wireless.wiki.kernel.org/en/users/drivers/ath10k/submittingpatches
Thanks!
It does not apply to 4.17-rc6 as-is, either, but it seems fixable at
the first glance.
> > You would expect that a single driver crash should not affect the whole
> > system like this, but it does.
> Yeah, it definitely should not crash like that. I wonder what has broken
> it.
Do you mean that this used to work correctly at some point? If that is
the case, can you tell me which version it worked with?
> I doubt neither of the bugs (the firmware crash and the kernel hang) are
> fixed, at least I don't recall seeing any fixes. But if you have time to
> test, use the master branch of my ath.git tree for any testing.
Allright, I will take a look at master as well.
Were you able to reproduce these issues on master?
Thanks & best regards,
Timur
_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Frequent ath10k crashes with QCA6174 on kernels 4.16 and 4.17
2018-05-29 19:36 ` Timur Kristóf
@ 2018-05-30 7:15 ` Timur Kristóf
2018-06-08 18:37 ` Ryan Hsu
0 siblings, 1 reply; 10+ messages in thread
From: Timur Kristóf @ 2018-05-30 7:15 UTC (permalink / raw)
To: Kalle Valo; +Cc: ath10k, ryanhsu
On Tue, 2018-05-29 at 22:36 +0300, Timur Kristóf wrote:
> On Tue, 2018-05-29 at 16:29 +0300, Kalle Valo wrote:
> > Timur Kristóf <timur.kristof@gmail.com> writes:
> > > I don't have much experience with this, but will take a look and
> > > let
> > > you know how that goes.
> >
> > Ok, let me know how it goes. Some info and pointers here:
> > https://wireless.wiki.kernel.org/en/users/drivers/ath10k/submitting
> > patches
>
> Thanks!
> It does not apply to 4.17-rc6 as-is, either, but it seems fixable at
> the first glance.
>
Basically, the author forgot to define WMI_PEER_PHYMODE but some
searching found another QCA driver where it is 0xd so I decided to go
with that. The other error was trivial.
I fixed the patch and applied locally on top of 4.17-rc7. Currently
running it to test if it really mitigates the issue. Will report back
when I have results, and submit the patch if I'm satisfied.
Cheers,
Tim
_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Frequent ath10k crashes with QCA6174 on kernels 4.16 and 4.17
2018-05-30 7:15 ` Timur Kristóf
@ 2018-06-08 18:37 ` Ryan Hsu
2018-06-08 19:27 ` timur.kristof
0 siblings, 1 reply; 10+ messages in thread
From: Ryan Hsu @ 2018-06-08 18:37 UTC (permalink / raw)
To: Timur Kristóf, Kalle Valo; +Cc: ath10k, ryanhsu
On 05/30/2018 12:15 AM, Timur Kristóf wrote:
> On Tue, 2018-05-29 at 22:36 +0300, Timur Kristóf wrote:
>>
>> Thanks!
>> It does not apply to 4.17-rc6 as-is, either, but it seems fixable at
>> the first glance.
>>
> Basically, the author forgot to define WMI_PEER_PHYMODE but some
> searching found another QCA driver where it is 0xd so I decided to go
> with that. The other error was trivial.
>
> I fixed the patch and applied locally on top of 4.17-rc7. Currently
> running it to test if it really mitigates the issue. Will report back
> when I have results, and submit the patch if I'm satisfied.
>
Sorry, I've been dragged by something else and didn't update this promptly.
I just sent the v2 including the missing WMI_PEER_PHYMODE.
Can you pick up the v2 and see if that help your issue?
https://patchwork.kernel.org/patch/10454967/
--
Ryan Hsu
_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Frequent ath10k crashes with QCA6174 on kernels 4.16 and 4.17
2018-06-08 18:37 ` Ryan Hsu
@ 2018-06-08 19:27 ` timur.kristof
2018-06-08 20:25 ` Ryan Hsu
0 siblings, 1 reply; 10+ messages in thread
From: timur.kristof @ 2018-06-08 19:27 UTC (permalink / raw)
To: Ryan Hsu, Kalle Valo; +Cc: ath10k
On Fri, 2018-06-08 at 11:37 -0700, Ryan Hsu wrote:
> On 05/30/2018 12:15 AM, Timur Kristóf wrote:
>
> > On Tue, 2018-05-29 at 22:36 +0300, Timur Kristóf wrote:
> > >
> > > Thanks!
> > > It does not apply to 4.17-rc6 as-is, either, but it seems fixable
> > > at
> > > the first glance.
> > >
> >
> > Basically, the author forgot to define WMI_PEER_PHYMODE but some
> > searching found another QCA driver where it is 0xd so I decided to
> > go
> > with that. The other error was trivial.
> >
> > I fixed the patch and applied locally on top of 4.17-rc7. Currently
> > running it to test if it really mitigates the issue. Will report
> > back
> > when I have results, and submit the patch if I'm satisfied.
> >
>
> Sorry, I've been dragged by something else and didn't update this
> promptly.
> I just sent the v2 including the missing WMI_PEER_PHYMODE.
>
> Can you pick up the v2 and see if that help your issue?
>
> https://patchwork.kernel.org/patch/10454967/
>
Hi Ryan,
I already posted a patch v2 here:
http://lists.infradead.org/pipermail/ath10k/2018-May/011545.html
(Basically made the same changes as you.)
Yes, it fixes the problem.
Best regards,
Tim
_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Frequent ath10k crashes with QCA6174 on kernels 4.16 and 4.17
2018-06-08 19:27 ` timur.kristof
@ 2018-06-08 20:25 ` Ryan Hsu
2018-06-09 12:56 ` Timur Kristóf
0 siblings, 1 reply; 10+ messages in thread
From: Ryan Hsu @ 2018-06-08 20:25 UTC (permalink / raw)
To: timur.kristof, Kalle Valo; +Cc: ath10k
On 06/08/2018 12:27 PM, timur.kristof@gmail.com wrote:
> Hi Ryan,
>
> I already posted a patch v2 here:
> http://lists.infradead.org/pipermail/ath10k/2018-May/011545.html
> (Basically made the same changes as you.)
> Yes, it fixes the problem.
Cool, thanks a lot!
--
Ryan Hsu
_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Frequent ath10k crashes with QCA6174 on kernels 4.16 and 4.17
2018-06-08 20:25 ` Ryan Hsu
@ 2018-06-09 12:56 ` Timur Kristóf
0 siblings, 0 replies; 10+ messages in thread
From: Timur Kristóf @ 2018-06-09 12:56 UTC (permalink / raw)
To: Ryan Hsu; +Cc: ath10k, Kalle Valo
On Fri, Jun 8, 2018 at 10:25 PM, Ryan Hsu <ryanhsu@codeaurora.org> wrote:
> On 06/08/2018 12:27 PM, timur.kristof@gmail.com wrote:
>> Hi Ryan,
>>
>> I already posted a patch v2 here:
>> http://lists.infradead.org/pipermail/ath10k/2018-May/011545.html
>> (Basically made the same changes as you.)
>> Yes, it fixes the problem.
>
> Cool, thanks a lot!
Sure thing. I'm happy to see this issue go away!
During my testing I just ran the system for some time while connected
to an AP which I set into 20MHz/40MHz mode.
Without the patch: the firmware crashes, which after a while takes the
whole system down with it.
With the patch: I observed several successful bandwidth changes in the
logs, without any crash at all.
I believe the patch is ready to get merged. Doesn't matter if yours or
mine, as they are basically the same.
_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2018-06-09 12:56 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-05-29 8:52 Frequent ath10k crashes with QCA6174 on kernels 4.16 and 4.17 Timur Kristóf
2018-05-29 10:21 ` Kalle Valo
2018-05-29 12:32 ` Timur Kristóf
2018-05-29 13:29 ` Kalle Valo
2018-05-29 19:36 ` Timur Kristóf
2018-05-30 7:15 ` Timur Kristóf
2018-06-08 18:37 ` Ryan Hsu
2018-06-08 19:27 ` timur.kristof
2018-06-08 20:25 ` Ryan Hsu
2018-06-09 12:56 ` Timur Kristóf
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.