All of lore.kernel.org
 help / color / mirror / Atom feed
From: Kalle Valo <kvalo@codeaurora.org>
To: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>,
	Hemant Kumar <hemantk@codeaurora.org>,
	Jeffrey Hugo <jhugo@codeaurora.org>,
	Bhaumik Bhatt <bbhatt@codeaurora.org>,
	Bjorn Andersson <bjorn.andersson@linaro.org>
Cc: Stephen Liang <stephenliang7@gmail.com>,
	Carl Huang <cjhuang@codeaurora.org>,
	ath11k@lists.infradead.org, wink@technolu.st,
	Mitchell Nordine <mail@mitchellnordine.com>
Subject: ath11k: crashes with 1 MSI vector, workaround disable MHI M2 state
Date: Wed, 16 Dec 2020 10:47:18 +0200	[thread overview]
Message-ID: <87pn3axhm1.fsf@codeaurora.org> (raw)

Hi MHI devs,

To keep the discussion organised I'll start a new thread about weird
kernel crashes we are seeing on ath11k, and include MHI folks as well in
case they have any ideas. This is a long story, but I try to summarise
this as short as I can :)

Recently Dell released laptops with QCA6390. Unfortunately there's a
BIOS bug[1] and ath11k only receives 1 MSI vector, opposed to 32 vectors
it needs. Carl implemented a proof of concept patch[2] which worked fine
on some platforms, for example I didn't see any issues on my Intel NUC
with QCA6390.

But once we people with Dell XPS 13 9310 started testing Carl's patches
started reporting weird kernel crashes. This is what wink reported[3]:

----------------------------------------------------------------------
So up until this point, everything is working without issues.
Everything seems to spiral out of control a couple of seconds later
when my system attempts to actually bring up the adapter.  In most of
the crash states I will see this:

[   31.286725] wlp85s0: send auth to ec:08:6b:27:01:ea (try 1/3)
[   31.390187] wlp85s0: send auth to ec:08:6b:27:01:ea (try 2/3)
[   31.391928] wlp85s0: authenticated
[   31.394196] wlp85s0: associate with ec:08:6b:27:01:ea (try 1/3)
[   31.396513] wlp85s0: RX AssocResp from ec:08:6b:27:01:ea
(capab=0x411 status=0 aid=6)
[   31.407730] wlp85s0: associated
[   31.434354] IPv6: ADDRCONF(NETDEV_CHANGE): wlp85s0: link becomes ready

And then either somewhere in that pile of messages, or a second or two
after this my machine will start to stutter as I mentioned before, and
then it either hangs, or I see this message (I'm truncating the
timestamp):

[   35.xxxx ] sched: RT throttling activated

After that moment, the machine is unresponsive.  Sorry I can't seem to
extract this data other than screenshots from my phone at the moment,
you can see the dmesg output from 6 different hangs here:

https://github.com/w1nk/ath11k-debug
----------------------------------------------------------------------

Wink even made videos available[3].

After extensive debugging from wink he found out that disabling M2 state
makes the all problems go away:

--- a/drivers/bus/mhi/core/pm.c
+++ b/drivers/bus/mhi/core/pm.c
@@ -55,12 +55,12 @@ static struct mhi_pm_transitions const dev_state_transitions[] = {
        },
        {
                MHI_PM_M0,
-               MHI_PM_M0 | MHI_PM_M2 | MHI_PM_M3_ENTER |
+               MHI_PM_M0 | MHI_PM_M3_ENTER |
                MHI_PM_SYS_ERR_DETECT | MHI_PM_SHUTDOWN_PROCESS |
                MHI_PM_LD_ERR_FATAL_DETECT | MHI_PM_FW_DL_ERR
        },
        {
-               MHI_PM_M2,
+               MHI_PM_M0,
                MHI_PM_M0 | MHI_PM_SYS_ERR_DETECT | MHI_PM_SHUTDOWN_PROCESS |
                MHI_PM_LD_ERR_FATAL_DETECT
        },

And indeed now we have numerous people reporting that with this
workaround ath11k is stable on their Dell XPS 13 9310 laptops. What on
earth could cause these kernel crashes/interrupt storms? And why is it
visible only on Dell laptops? Why does disabling M2 state fix it?

Also something to investigate is does AC power vs battery power have
something to do with this? Can that affect M2 states somehow?

Any other ideas how to debug this? This is a very weird problem.

Wink and others, in case I missed something please do fill in.

Kalle

[1] https://lore.kernel.org/ath11k/87mtzxkus5.fsf@nanos.tec.linutronix.de/

[2] https://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath.git/commit/?h=ath11k-qca6390-bringup&id=742b5de85acf7f25ca327c66c2b71d4f2cb6c245

[3] https://drive.google.com/drive/folders/1wvxZI5XtwPSrm0-6-Ov50cUfqBXSXeNz?usp=sharing

-- 
https://patchwork.kernel.org/project/linux-wireless/list/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

-- 
ath11k mailing list
ath11k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath11k

             reply	other threads:[~2020-12-16  8:47 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-16  8:47 Kalle Valo [this message]
2020-12-17  8:41 ` ath11k: crashes with 1 MSI vector, workaround disable MHI M2 state Kalle Valo
2020-12-17  9:53 ` Manivannan Sadhasivam
2020-12-17 19:01   ` Stephen Liang
2020-12-19 21:34   ` wi nk
2020-12-20 15:05     ` Manivannan Sadhasivam
2020-12-20 15:39       ` wi nk
2020-12-21 17:15         ` Kalle Valo
2020-12-21 17:26           ` wi nk

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87pn3axhm1.fsf@codeaurora.org \
    --to=kvalo@codeaurora.org \
    --cc=ath11k@lists.infradead.org \
    --cc=bbhatt@codeaurora.org \
    --cc=bjorn.andersson@linaro.org \
    --cc=cjhuang@codeaurora.org \
    --cc=hemantk@codeaurora.org \
    --cc=jhugo@codeaurora.org \
    --cc=mail@mitchellnordine.com \
    --cc=manivannan.sadhasivam@linaro.org \
    --cc=stephenliang7@gmail.com \
    --cc=wink@technolu.st \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.