public inbox for linux-arm-msm@vger.kernel.org
 help / color / mirror / Atom feed
From: Bhaumik Bhatt <bbhatt@codeaurora.org>
To: Kalle Valo <kvalo@codeaurora.org>
Cc: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>,
	Hemant Kumar <hemantk@codeaurora.org>,
	linux-arm-msm@vger.kernel.org, ath11k@lists.infradead.org,
	kvalo=codeaurora.org@codeaurora.org
Subject: Re: [regression] mhi: mhi_pm_st_worker blocked for more than 61 seconds.
Date: Tue, 09 Mar 2021 10:07:48 -0800	[thread overview]
Message-ID: <f682aa375f6501ff27aceaa687e5d694@codeaurora.org> (raw)
In-Reply-To: <87k0qgz38r.fsf@codeaurora.org>

On 2021-03-09 08:44 AM, Kalle Valo wrote:
> Kalle Valo <kvalo@codeaurora.org> writes:
> 
>> Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> writes:
>> 
>>> Hi Kalle,
>>> 
>>> On Thu, Mar 04, 2021 at 04:59:33PM +0200, Kalle Valo wrote:
>>>> Hi MHI folks,
>>>> 
>>>> I upgraded my QCA6390 x86 test box to v5.12-rc1 and started seeing
>>>> kernel crashes when testing ath11k. I don't recall seeing this on 
>>>> v5.11
>>>> so it looks like a new problem, but I cannot be 100% sure. 
>>>> Netconsole
>>>> output is below. I have most of the kernel debug functionality 
>>>> enabled
>>>> (KASAN etc).
>>>> 
>>>> I can fairly easy reproduce this by looping insmod and rmmod of mhi,
>>>> wireless and ath11k modules. It does not happen every time, but I 
>>>> would
>>>> say I can reproduce the problem within 10 test loops or so.
>>>> 
>>>> Any ideas what could cause this? I have not bisected this due to 
>>>> lack of
>>>> time, but I can test patches etc.
>>>> 
>>> 
>>> Not sure if this is related, Loic sent a patch which fixes an issue 
>>> with
>>> "mhi_pm_state_worker":
>>> 
>>> https://patchwork.kernel.org/project/linux-arm-msm/patch/1614161930-8513-1-git-send-email-loic.poulain@linaro.org/
>>> 
>>> Can you please test see if it fixes your issue also?
>> 
>> Thanks for the link, but unfortunately not :( I was able to reproduce
>> the issue just after 3 insmod/rmmod loops.
> 
> I investigated this a bit more, I was actually able to reproduce this 
> in
> v5.11 as well. So this is not a new regression. The reason why I 
> started
> seeing this until now is that I enable more debug options in the 
> kernel,
> the diff below. Without those changes I don't see the problem.
> 
> I also found a workround, if I add sleep(1) after insmod ath11k_pci in
> my test script I see 200 loops without crashes. But when I removed the
> sleep the test script crashed only after 19 loops. So there definitely
> is a race condition somewhere, just don't know where. I don't have time
> to investigate this more, so I'll just use the workaround for the time
> being.
> 
> --- ../configs/nuc-debug-5.11	2021-02-21 08:55:53.836061988 +0200
> +++ .config	2021-03-09 16:22:53.598684524 +0200
> @@ -12,6 +12,7 @@
>  CONFIG_CC_CAN_LINK_STATIC=y
>  CONFIG_CC_HAS_ASM_GOTO=y
>  CONFIG_CC_HAS_ASM_INLINE=y
> +CONFIG_CONSTRUCTORS=y
>  CONFIG_IRQ_WORK=y
>  CONFIG_BUILDTIME_TABLE_SORT=y
>  CONFIG_THREAD_INFO_IN_TASK=y
> @@ -280,6 +281,7 @@
>  CONFIG_ARCH_WANT_GENERAL_HUGETLB=y
>  CONFIG_ZONE_DMA32=y
>  CONFIG_AUDIT_ARCH=y
> +CONFIG_KASAN_SHADOW_OFFSET=0xdffffc0000000000
>  CONFIG_HAVE_INTEL_TXT=y
>  CONFIG_X86_64_SMP=y
>  CONFIG_ARCH_SUPPORTS_UPROBES=y
> @@ -748,8 +750,7 @@
>  # CONFIG_MODULE_SIG is not set
>  # CONFIG_MODULE_COMPRESS is not set
>  # CONFIG_MODULE_ALLOW_MISSING_NAMESPACE_IMPORTS is not set
> -# CONFIG_UNUSED_SYMBOLS is not set
> -# CONFIG_TRIM_UNUSED_KSYMS is not set
> +CONFIG_UNUSED_SYMBOLS=y
>  CONFIG_MODULES_TREE_LOOKUP=y
>  CONFIG_BLOCK=y
>  CONFIG_BLK_SCSI_REQUEST=y
> @@ -1164,7 +1165,6 @@
>  # CONFIG_NET_NCSI is not set
>  CONFIG_RPS=y
>  CONFIG_RFS_ACCEL=y
> -CONFIG_SOCK_RX_QUEUE_MAPPING=y
>  CONFIG_XPS=y
>  # CONFIG_CGROUP_NET_PRIO is not set
>  # CONFIG_CGROUP_NET_CLASSID is not set
> @@ -1685,7 +1685,6 @@
>  #
>  # Distributed Switch Architecture drivers
>  #
> -# CONFIG_NET_DSA_MV88E6XXX_PTP is not set
>  # end of Distributed Switch Architecture drivers
> 
>  CONFIG_ETHERNET=y
> @@ -1700,6 +1699,7 @@
>  # CONFIG_NET_VENDOR_AQUANTIA is not set
>  # CONFIG_NET_VENDOR_ARC is not set
>  # CONFIG_NET_VENDOR_ATHEROS is not set
> +# CONFIG_NET_VENDOR_AURORA is not set
>  # CONFIG_NET_VENDOR_BROADCOM is not set
>  CONFIG_NET_VENDOR_BROCADE=y
>  # CONFIG_BNA is not set
> @@ -1914,7 +1914,6 @@
>  # CONFIG_MT7615E is not set
>  # CONFIG_MT7663U is not set
>  # CONFIG_MT7915E is not set
> -# CONFIG_MT7921E is not set
>  # CONFIG_WLAN_VENDOR_MICROCHIP is not set
>  CONFIG_WLAN_VENDOR_RALINK=y
>  # CONFIG_RT2X00 is not set
> @@ -4500,7 +4499,7 @@
>  CONFIG_DEBUG_INFO_COMPRESSED=y
>  # CONFIG_DEBUG_INFO_SPLIT is not set
>  # CONFIG_DEBUG_INFO_DWARF4 is not set
> -# CONFIG_GDB_SCRIPTS is not set
> +CONFIG_GDB_SCRIPTS=y
>  CONFIG_FRAME_WARN=2048
>  # CONFIG_STRIP_ASM_SYMS is not set
>  # CONFIG_READABLE_ASM is not set
> @@ -4540,13 +4539,13 @@
>  CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT=y
>  CONFIG_PAGE_OWNER=y
>  CONFIG_PAGE_POISONING=y
> -# CONFIG_DEBUG_PAGE_REF is not set
> +CONFIG_DEBUG_PAGE_REF=y
>  # CONFIG_DEBUG_RODATA_TEST is not set
>  CONFIG_ARCH_HAS_DEBUG_WX=y
>  CONFIG_DEBUG_WX=y
>  CONFIG_GENERIC_PTDUMP=y
>  CONFIG_PTDUMP_CORE=y
> -# CONFIG_PTDUMP_DEBUGFS is not set
> +CONFIG_PTDUMP_DEBUGFS=y
>  CONFIG_DEBUG_OBJECTS=y
>  # CONFIG_DEBUG_OBJECTS_SELFTEST is not set
>  CONFIG_DEBUG_OBJECTS_FREE=y
> @@ -4568,8 +4567,8 @@
>  CONFIG_ARCH_HAS_DEBUG_VM_PGTABLE=y
>  CONFIG_DEBUG_VM=y
>  CONFIG_DEBUG_VM_VMACACHE=y
> -# CONFIG_DEBUG_VM_RB is not set
> -# CONFIG_DEBUG_VM_PGFLAGS is not set
> +CONFIG_DEBUG_VM_RB=y
> +CONFIG_DEBUG_VM_PGFLAGS=y
>  CONFIG_DEBUG_VM_PGTABLE=y
>  CONFIG_ARCH_HAS_DEBUG_VIRTUAL=y
>  CONFIG_DEBUG_VIRTUAL=y
> @@ -4581,7 +4580,13 @@
>  CONFIG_HAVE_ARCH_KASAN_VMALLOC=y
>  CONFIG_CC_HAS_KASAN_GENERIC=y
>  CONFIG_CC_HAS_WORKING_NOSANITIZE_ADDRESS=y
> -# CONFIG_KASAN is not set
> +CONFIG_KASAN=y
> +CONFIG_KASAN_GENERIC=y
> +# CONFIG_KASAN_OUTLINE is not set
> +CONFIG_KASAN_INLINE=y
> +CONFIG_KASAN_STACK=1
> +CONFIG_KASAN_VMALLOC=y
> +# CONFIG_TEST_KASAN_MODULE is not set
>  # end of Memory Debugging
> 
>  CONFIG_DEBUG_SHIRQ=y

Thanks Kalle. We will have to try to reproduce and investigate this on a
different controller as well to dive in and fix any race if seen.

Thanks,
Bhaumik
---
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,
a Linux Foundation Collaborative Project

      reply	other threads:[~2021-03-09 18:08 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-04 14:59 [regression] mhi: mhi_pm_st_worker blocked for more than 61 seconds Kalle Valo
2021-03-04 15:10 ` Manivannan Sadhasivam
2021-03-04 17:21   ` Kalle Valo
2021-03-09 16:44     ` Kalle Valo
2021-03-09 18:07       ` Bhaumik Bhatt [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f682aa375f6501ff27aceaa687e5d694@codeaurora.org \
    --to=bbhatt@codeaurora.org \
    --cc=ath11k@lists.infradead.org \
    --cc=hemantk@codeaurora.org \
    --cc=kvalo=codeaurora.org@codeaurora.org \
    --cc=kvalo@codeaurora.org \
    --cc=linux-arm-msm@vger.kernel.org \
    --cc=manivannan.sadhasivam@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox