From: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
To: wi nk <wink@technolu.st>
Cc: Stephen Liang <stephenliang7@gmail.com>,
Jeffrey Hugo <jhugo@codeaurora.org>,
Carl Huang <cjhuang@codeaurora.org>,
Bhaumik Bhatt <bbhatt@codeaurora.org>,
Bjorn Andersson <bjorn.andersson@linaro.org>,
Hemant Kumar <hemantk@codeaurora.org>,
ath11k@lists.infradead.org, Kalle Valo <kvalo@codeaurora.org>,
Mitchell Nordine <mail@mitchellnordine.com>
Subject: Re: ath11k: crashes with 1 MSI vector, workaround disable MHI M2 state
Date: Sun, 20 Dec 2020 20:35:03 +0530 [thread overview]
Message-ID: <20201220150503.GA4283@thinkpad> (raw)
In-Reply-To: <CAHUdJJX2fW0KigSgMT03PkCQKQyya4ZiRHrO4iopOS2GzT6NWA@mail.gmail.com>
Hi,
On Sat, Dec 19, 2020 at 10:34:23PM +0100, wi nk wrote:
> On Thu, Dec 17, 2020 at 10:53 AM Manivannan Sadhasivam
> <manivannan.sadhasivam@linaro.org> wrote:
> >
> > Hi Kalle,
> >
> > On Wed, Dec 16, 2020 at 10:47:18AM +0200, Kalle Valo wrote:
> > > Hi MHI devs,
> > >
> >
> > [...]
> >
> > > After extensive debugging from wink he found out that disabling M2 state
> > > makes the all problems go away:
> > >
> > > --- a/drivers/bus/mhi/core/pm.c
> > > +++ b/drivers/bus/mhi/core/pm.c
> > > @@ -55,12 +55,12 @@ static struct mhi_pm_transitions const dev_state_transitions[] = {
> > > },
> > > {
> > > MHI_PM_M0,
> > > - MHI_PM_M0 | MHI_PM_M2 | MHI_PM_M3_ENTER |
> > > + MHI_PM_M0 | MHI_PM_M3_ENTER |
> > > MHI_PM_SYS_ERR_DETECT | MHI_PM_SHUTDOWN_PROCESS |
> > > MHI_PM_LD_ERR_FATAL_DETECT | MHI_PM_FW_DL_ERR
> > > },
> > > {
> > > - MHI_PM_M2,
> > > + MHI_PM_M0,
> > > MHI_PM_M0 | MHI_PM_SYS_ERR_DETECT | MHI_PM_SHUTDOWN_PROCESS |
> > > MHI_PM_LD_ERR_FATAL_DETECT
> > > },
> > >
> > > And indeed now we have numerous people reporting that with this
> > > workaround ath11k is stable on their Dell XPS 13 9310 laptops. What on
> > > earth could cause these kernel crashes/interrupt storms? And why is it
> > > visible only on Dell laptops? Why does disabling M2 state fix it?
> > >
> >
> > This is related to the ASPM state of the PCIe bus. In the meantime, I'd
> > suggest to turn off ASPM using "pcie_aspm=off" in the kernel command
> > line so that the MHI bus stays in M0.
> >
> > For debugging this issue, can someone enable debug logs for MHI and share
> > the dmesg output (with ASPM enabled ofc)?
> >
> > Thanks,
> > Mani
>
> Hi Mani,
>
> Thanks for the information and ideas. I tried to disable ASPM with
> the kernel parameter you mentioned, that didn't seem to work, so I
> removed ASPM support from my kernel altogether. I still see the
> adapter in the M1 state, which with my patch would've gone to M2 had
> it not been disabled. Is ASPM the only thing that will trigger the M*
> transitions? Would it require a transition to M2 regardless of
> settings (maybe that's why it tried)?
That's what I suspected but looks like the QCA6390 enters M1 state (which will
inturn cause host MHI to transition to M2) when it detects the link inactivity
using a timer. But with my NUC, I can't get QCA6390 to enter M1 state
regardless of the ASPM support in BIOS. In both conditions (with/without ASPM)
device just stays in M0. My hard is guess is that the device depends on the WAKE
sideband signal to go low for entering the M1 state even when it detects link
inactivity in the PCIe bus. And this signal might be low on Dell laptops. But I
need Hemant/Bhaumik to confirm this!
Inspite of that, I got a plenty of below messages in dmesg log when MHI
debugging is enabled:
local ee:AMSS device ee:AMSS dev_state:M0
local ee:AMSS device ee:AMSS dev_state:M0
local ee:AMSS device ee:AMSS dev_state:M0
local ee:AMSS device ee:AMSS dev_state:M0
local ee:AMSS device ee:AMSS dev_state:M0
And this only happens when one MSI vector is used and shared by all IRQs. This
is because for shared IRQs, the kernel calls all of the registered ISRs of the
interrupt line when an interrupt occurs. So I cooked up a patch which checks for
the device state before proceeding through the ISR:
diff --git a/drivers/bus/mhi/core/main.c b/drivers/bus/mhi/core/main.c
index 2cff5ddff225..520948f3051d 100644
--- a/drivers/bus/mhi/core/main.c
+++ b/drivers/bus/mhi/core/main.c
@@ -386,6 +386,13 @@ irqreturn_t mhi_intvec_threaded_handler(int irq_number, void *priv)
state = mhi_get_mhi_state(mhi_cntrl);
ee = mhi_cntrl->ee;
mhi_cntrl->ee = mhi_get_exec_env(mhi_cntrl);
+
+ /* Only proceed if the device state is different */
+ if (mhi_cntrl->dev_state == state) {
+ write_unlock_irq(&mhi_cntrl->pm_lock);
+ goto exit_intvec;
+ }
+
dev_dbg(dev, "local ee:%s device ee:%s dev_state:%s\n",
TO_MHI_EXEC_STR(mhi_cntrl->ee), TO_MHI_EXEC_STR(ee),
TO_MHI_STATE_STR(state));
This prevents the spike of the debug messages but not sure if the issue you're
seeing is related to this. Can you give this patch a try on your setup?
Thanks,
Mani
> The MHI dmesg output is pretty
> consistent when it fails, it looks like this:
> https://i.imgur.com/0XExack.jpg . You can also see it in the mp4's
> I've placed here:
> https://drive.google.com/drive/folders/1wvxZI5XtwPSrm0-6-Ov50cUfqBXSXeNz?usp=sharing
> . Also note that the failure isn't deterministic, sometimes the
> transition to M2 will succeed and everything works.
>
> Thanks!
--
ath11k mailing list
ath11k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath11k
next prev parent reply other threads:[~2020-12-20 15:05 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-12-16 8:47 ath11k: crashes with 1 MSI vector, workaround disable MHI M2 state Kalle Valo
2020-12-17 8:41 ` Kalle Valo
2020-12-17 9:53 ` Manivannan Sadhasivam
2020-12-17 19:01 ` Stephen Liang
2020-12-19 21:34 ` wi nk
2020-12-20 15:05 ` Manivannan Sadhasivam [this message]
2020-12-20 15:39 ` wi nk
2020-12-21 17:15 ` Kalle Valo
2020-12-21 17:26 ` wi nk
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20201220150503.GA4283@thinkpad \
--to=manivannan.sadhasivam@linaro.org \
--cc=ath11k@lists.infradead.org \
--cc=bbhatt@codeaurora.org \
--cc=bjorn.andersson@linaro.org \
--cc=cjhuang@codeaurora.org \
--cc=hemantk@codeaurora.org \
--cc=jhugo@codeaurora.org \
--cc=kvalo@codeaurora.org \
--cc=mail@mitchellnordine.com \
--cc=stephenliang7@gmail.com \
--cc=wink@technolu.st \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.