All of lore.kernel.org
 help / color / mirror / Atom feed
From: Kalle Valo <kvalo@codeaurora.org>
To: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Cc: Hemant Kumar <hemantk@codeaurora.org>,
	Bhaumik Bhatt <bbhatt@codeaurora.org>,
	linux-arm-msm@vger.kernel.org, ath11k@lists.infradead.org
Subject: Re: [regression] mhi: mhi_pm_st_worker blocked for more than 61 seconds.
Date: Tue, 09 Mar 2021 18:44:04 +0200	[thread overview]
Message-ID: <87k0qgz38r.fsf@codeaurora.org> (raw)
In-Reply-To: <87blby6dir.fsf@codeaurora.org> (Kalle Valo's message of "Thu, 04 Mar 2021 19:21:16 +0200")

Kalle Valo <kvalo@codeaurora.org> writes:

> Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> writes:
>
>> Hi Kalle,
>>
>> On Thu, Mar 04, 2021 at 04:59:33PM +0200, Kalle Valo wrote:
>>> Hi MHI folks,
>>> 
>>> I upgraded my QCA6390 x86 test box to v5.12-rc1 and started seeing
>>> kernel crashes when testing ath11k. I don't recall seeing this on v5.11
>>> so it looks like a new problem, but I cannot be 100% sure. Netconsole
>>> output is below. I have most of the kernel debug functionality enabled
>>> (KASAN etc).
>>> 
>>> I can fairly easy reproduce this by looping insmod and rmmod of mhi,
>>> wireless and ath11k modules. It does not happen every time, but I would
>>> say I can reproduce the problem within 10 test loops or so.
>>> 
>>> Any ideas what could cause this? I have not bisected this due to lack of
>>> time, but I can test patches etc.
>>> 
>>
>> Not sure if this is related, Loic sent a patch which fixes an issue with
>> "mhi_pm_state_worker":
>>
>> https://patchwork.kernel.org/project/linux-arm-msm/patch/1614161930-8513-1-git-send-email-loic.poulain@linaro.org/
>>
>> Can you please test see if it fixes your issue also?
>
> Thanks for the link, but unfortunately not :( I was able to reproduce
> the issue just after 3 insmod/rmmod loops.

I investigated this a bit more, I was actually able to reproduce this in
v5.11 as well. So this is not a new regression. The reason why I started
seeing this until now is that I enable more debug options in the kernel,
the diff below. Without those changes I don't see the problem.

I also found a workround, if I add sleep(1) after insmod ath11k_pci in
my test script I see 200 loops without crashes. But when I removed the
sleep the test script crashed only after 19 loops. So there definitely
is a race condition somewhere, just don't know where. I don't have time
to investigate this more, so I'll just use the workaround for the time
being.

--- ../configs/nuc-debug-5.11	2021-02-21 08:55:53.836061988 +0200
+++ .config	2021-03-09 16:22:53.598684524 +0200
@@ -12,6 +12,7 @@
 CONFIG_CC_CAN_LINK_STATIC=y
 CONFIG_CC_HAS_ASM_GOTO=y
 CONFIG_CC_HAS_ASM_INLINE=y
+CONFIG_CONSTRUCTORS=y
 CONFIG_IRQ_WORK=y
 CONFIG_BUILDTIME_TABLE_SORT=y
 CONFIG_THREAD_INFO_IN_TASK=y
@@ -280,6 +281,7 @@
 CONFIG_ARCH_WANT_GENERAL_HUGETLB=y
 CONFIG_ZONE_DMA32=y
 CONFIG_AUDIT_ARCH=y
+CONFIG_KASAN_SHADOW_OFFSET=0xdffffc0000000000
 CONFIG_HAVE_INTEL_TXT=y
 CONFIG_X86_64_SMP=y
 CONFIG_ARCH_SUPPORTS_UPROBES=y
@@ -748,8 +750,7 @@
 # CONFIG_MODULE_SIG is not set
 # CONFIG_MODULE_COMPRESS is not set
 # CONFIG_MODULE_ALLOW_MISSING_NAMESPACE_IMPORTS is not set
-# CONFIG_UNUSED_SYMBOLS is not set
-# CONFIG_TRIM_UNUSED_KSYMS is not set
+CONFIG_UNUSED_SYMBOLS=y
 CONFIG_MODULES_TREE_LOOKUP=y
 CONFIG_BLOCK=y
 CONFIG_BLK_SCSI_REQUEST=y
@@ -1164,7 +1165,6 @@
 # CONFIG_NET_NCSI is not set
 CONFIG_RPS=y
 CONFIG_RFS_ACCEL=y
-CONFIG_SOCK_RX_QUEUE_MAPPING=y
 CONFIG_XPS=y
 # CONFIG_CGROUP_NET_PRIO is not set
 # CONFIG_CGROUP_NET_CLASSID is not set
@@ -1685,7 +1685,6 @@
 #
 # Distributed Switch Architecture drivers
 #
-# CONFIG_NET_DSA_MV88E6XXX_PTP is not set
 # end of Distributed Switch Architecture drivers
 
 CONFIG_ETHERNET=y
@@ -1700,6 +1699,7 @@
 # CONFIG_NET_VENDOR_AQUANTIA is not set
 # CONFIG_NET_VENDOR_ARC is not set
 # CONFIG_NET_VENDOR_ATHEROS is not set
+# CONFIG_NET_VENDOR_AURORA is not set
 # CONFIG_NET_VENDOR_BROADCOM is not set
 CONFIG_NET_VENDOR_BROCADE=y
 # CONFIG_BNA is not set
@@ -1914,7 +1914,6 @@
 # CONFIG_MT7615E is not set
 # CONFIG_MT7663U is not set
 # CONFIG_MT7915E is not set
-# CONFIG_MT7921E is not set
 # CONFIG_WLAN_VENDOR_MICROCHIP is not set
 CONFIG_WLAN_VENDOR_RALINK=y
 # CONFIG_RT2X00 is not set
@@ -4500,7 +4499,7 @@
 CONFIG_DEBUG_INFO_COMPRESSED=y
 # CONFIG_DEBUG_INFO_SPLIT is not set
 # CONFIG_DEBUG_INFO_DWARF4 is not set
-# CONFIG_GDB_SCRIPTS is not set
+CONFIG_GDB_SCRIPTS=y
 CONFIG_FRAME_WARN=2048
 # CONFIG_STRIP_ASM_SYMS is not set
 # CONFIG_READABLE_ASM is not set
@@ -4540,13 +4539,13 @@
 CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT=y
 CONFIG_PAGE_OWNER=y
 CONFIG_PAGE_POISONING=y
-# CONFIG_DEBUG_PAGE_REF is not set
+CONFIG_DEBUG_PAGE_REF=y
 # CONFIG_DEBUG_RODATA_TEST is not set
 CONFIG_ARCH_HAS_DEBUG_WX=y
 CONFIG_DEBUG_WX=y
 CONFIG_GENERIC_PTDUMP=y
 CONFIG_PTDUMP_CORE=y
-# CONFIG_PTDUMP_DEBUGFS is not set
+CONFIG_PTDUMP_DEBUGFS=y
 CONFIG_DEBUG_OBJECTS=y
 # CONFIG_DEBUG_OBJECTS_SELFTEST is not set
 CONFIG_DEBUG_OBJECTS_FREE=y
@@ -4568,8 +4567,8 @@
 CONFIG_ARCH_HAS_DEBUG_VM_PGTABLE=y
 CONFIG_DEBUG_VM=y
 CONFIG_DEBUG_VM_VMACACHE=y
-# CONFIG_DEBUG_VM_RB is not set
-# CONFIG_DEBUG_VM_PGFLAGS is not set
+CONFIG_DEBUG_VM_RB=y
+CONFIG_DEBUG_VM_PGFLAGS=y
 CONFIG_DEBUG_VM_PGTABLE=y
 CONFIG_ARCH_HAS_DEBUG_VIRTUAL=y
 CONFIG_DEBUG_VIRTUAL=y
@@ -4581,7 +4580,13 @@
 CONFIG_HAVE_ARCH_KASAN_VMALLOC=y
 CONFIG_CC_HAS_KASAN_GENERIC=y
 CONFIG_CC_HAS_WORKING_NOSANITIZE_ADDRESS=y
-# CONFIG_KASAN is not set
+CONFIG_KASAN=y
+CONFIG_KASAN_GENERIC=y
+# CONFIG_KASAN_OUTLINE is not set
+CONFIG_KASAN_INLINE=y
+CONFIG_KASAN_STACK=1
+CONFIG_KASAN_VMALLOC=y
+# CONFIG_TEST_KASAN_MODULE is not set
 # end of Memory Debugging
 
 CONFIG_DEBUG_SHIRQ=y

-- 
https://patchwork.kernel.org/project/linux-wireless/list/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

-- 
ath11k mailing list
ath11k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath11k

WARNING: multiple messages have this Message-ID (diff)
From: Kalle Valo <kvalo@codeaurora.org>
To: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Cc: Hemant Kumar <hemantk@codeaurora.org>,
	Bhaumik Bhatt <bbhatt@codeaurora.org>,
	linux-arm-msm@vger.kernel.org, ath11k@lists.infradead.org
Subject: Re: [regression] mhi: mhi_pm_st_worker blocked for more than 61 seconds.
Date: Tue, 09 Mar 2021 18:44:04 +0200	[thread overview]
Message-ID: <87k0qgz38r.fsf@codeaurora.org> (raw)
In-Reply-To: <87blby6dir.fsf@codeaurora.org> (Kalle Valo's message of "Thu, 04 Mar 2021 19:21:16 +0200")

Kalle Valo <kvalo@codeaurora.org> writes:

> Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> writes:
>
>> Hi Kalle,
>>
>> On Thu, Mar 04, 2021 at 04:59:33PM +0200, Kalle Valo wrote:
>>> Hi MHI folks,
>>> 
>>> I upgraded my QCA6390 x86 test box to v5.12-rc1 and started seeing
>>> kernel crashes when testing ath11k. I don't recall seeing this on v5.11
>>> so it looks like a new problem, but I cannot be 100% sure. Netconsole
>>> output is below. I have most of the kernel debug functionality enabled
>>> (KASAN etc).
>>> 
>>> I can fairly easy reproduce this by looping insmod and rmmod of mhi,
>>> wireless and ath11k modules. It does not happen every time, but I would
>>> say I can reproduce the problem within 10 test loops or so.
>>> 
>>> Any ideas what could cause this? I have not bisected this due to lack of
>>> time, but I can test patches etc.
>>> 
>>
>> Not sure if this is related, Loic sent a patch which fixes an issue with
>> "mhi_pm_state_worker":
>>
>> https://patchwork.kernel.org/project/linux-arm-msm/patch/1614161930-8513-1-git-send-email-loic.poulain@linaro.org/
>>
>> Can you please test see if it fixes your issue also?
>
> Thanks for the link, but unfortunately not :( I was able to reproduce
> the issue just after 3 insmod/rmmod loops.

I investigated this a bit more, I was actually able to reproduce this in
v5.11 as well. So this is not a new regression. The reason why I started
seeing this until now is that I enable more debug options in the kernel,
the diff below. Without those changes I don't see the problem.

I also found a workround, if I add sleep(1) after insmod ath11k_pci in
my test script I see 200 loops without crashes. But when I removed the
sleep the test script crashed only after 19 loops. So there definitely
is a race condition somewhere, just don't know where. I don't have time
to investigate this more, so I'll just use the workaround for the time
being.

--- ../configs/nuc-debug-5.11	2021-02-21 08:55:53.836061988 +0200
+++ .config	2021-03-09 16:22:53.598684524 +0200
@@ -12,6 +12,7 @@
 CONFIG_CC_CAN_LINK_STATIC=y
 CONFIG_CC_HAS_ASM_GOTO=y
 CONFIG_CC_HAS_ASM_INLINE=y
+CONFIG_CONSTRUCTORS=y
 CONFIG_IRQ_WORK=y
 CONFIG_BUILDTIME_TABLE_SORT=y
 CONFIG_THREAD_INFO_IN_TASK=y
@@ -280,6 +281,7 @@
 CONFIG_ARCH_WANT_GENERAL_HUGETLB=y
 CONFIG_ZONE_DMA32=y
 CONFIG_AUDIT_ARCH=y
+CONFIG_KASAN_SHADOW_OFFSET=0xdffffc0000000000
 CONFIG_HAVE_INTEL_TXT=y
 CONFIG_X86_64_SMP=y
 CONFIG_ARCH_SUPPORTS_UPROBES=y
@@ -748,8 +750,7 @@
 # CONFIG_MODULE_SIG is not set
 # CONFIG_MODULE_COMPRESS is not set
 # CONFIG_MODULE_ALLOW_MISSING_NAMESPACE_IMPORTS is not set
-# CONFIG_UNUSED_SYMBOLS is not set
-# CONFIG_TRIM_UNUSED_KSYMS is not set
+CONFIG_UNUSED_SYMBOLS=y
 CONFIG_MODULES_TREE_LOOKUP=y
 CONFIG_BLOCK=y
 CONFIG_BLK_SCSI_REQUEST=y
@@ -1164,7 +1165,6 @@
 # CONFIG_NET_NCSI is not set
 CONFIG_RPS=y
 CONFIG_RFS_ACCEL=y
-CONFIG_SOCK_RX_QUEUE_MAPPING=y
 CONFIG_XPS=y
 # CONFIG_CGROUP_NET_PRIO is not set
 # CONFIG_CGROUP_NET_CLASSID is not set
@@ -1685,7 +1685,6 @@
 #
 # Distributed Switch Architecture drivers
 #
-# CONFIG_NET_DSA_MV88E6XXX_PTP is not set
 # end of Distributed Switch Architecture drivers
 
 CONFIG_ETHERNET=y
@@ -1700,6 +1699,7 @@
 # CONFIG_NET_VENDOR_AQUANTIA is not set
 # CONFIG_NET_VENDOR_ARC is not set
 # CONFIG_NET_VENDOR_ATHEROS is not set
+# CONFIG_NET_VENDOR_AURORA is not set
 # CONFIG_NET_VENDOR_BROADCOM is not set
 CONFIG_NET_VENDOR_BROCADE=y
 # CONFIG_BNA is not set
@@ -1914,7 +1914,6 @@
 # CONFIG_MT7615E is not set
 # CONFIG_MT7663U is not set
 # CONFIG_MT7915E is not set
-# CONFIG_MT7921E is not set
 # CONFIG_WLAN_VENDOR_MICROCHIP is not set
 CONFIG_WLAN_VENDOR_RALINK=y
 # CONFIG_RT2X00 is not set
@@ -4500,7 +4499,7 @@
 CONFIG_DEBUG_INFO_COMPRESSED=y
 # CONFIG_DEBUG_INFO_SPLIT is not set
 # CONFIG_DEBUG_INFO_DWARF4 is not set
-# CONFIG_GDB_SCRIPTS is not set
+CONFIG_GDB_SCRIPTS=y
 CONFIG_FRAME_WARN=2048
 # CONFIG_STRIP_ASM_SYMS is not set
 # CONFIG_READABLE_ASM is not set
@@ -4540,13 +4539,13 @@
 CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT=y
 CONFIG_PAGE_OWNER=y
 CONFIG_PAGE_POISONING=y
-# CONFIG_DEBUG_PAGE_REF is not set
+CONFIG_DEBUG_PAGE_REF=y
 # CONFIG_DEBUG_RODATA_TEST is not set
 CONFIG_ARCH_HAS_DEBUG_WX=y
 CONFIG_DEBUG_WX=y
 CONFIG_GENERIC_PTDUMP=y
 CONFIG_PTDUMP_CORE=y
-# CONFIG_PTDUMP_DEBUGFS is not set
+CONFIG_PTDUMP_DEBUGFS=y
 CONFIG_DEBUG_OBJECTS=y
 # CONFIG_DEBUG_OBJECTS_SELFTEST is not set
 CONFIG_DEBUG_OBJECTS_FREE=y
@@ -4568,8 +4567,8 @@
 CONFIG_ARCH_HAS_DEBUG_VM_PGTABLE=y
 CONFIG_DEBUG_VM=y
 CONFIG_DEBUG_VM_VMACACHE=y
-# CONFIG_DEBUG_VM_RB is not set
-# CONFIG_DEBUG_VM_PGFLAGS is not set
+CONFIG_DEBUG_VM_RB=y
+CONFIG_DEBUG_VM_PGFLAGS=y
 CONFIG_DEBUG_VM_PGTABLE=y
 CONFIG_ARCH_HAS_DEBUG_VIRTUAL=y
 CONFIG_DEBUG_VIRTUAL=y
@@ -4581,7 +4580,13 @@
 CONFIG_HAVE_ARCH_KASAN_VMALLOC=y
 CONFIG_CC_HAS_KASAN_GENERIC=y
 CONFIG_CC_HAS_WORKING_NOSANITIZE_ADDRESS=y
-# CONFIG_KASAN is not set
+CONFIG_KASAN=y
+CONFIG_KASAN_GENERIC=y
+# CONFIG_KASAN_OUTLINE is not set
+CONFIG_KASAN_INLINE=y
+CONFIG_KASAN_STACK=1
+CONFIG_KASAN_VMALLOC=y
+# CONFIG_TEST_KASAN_MODULE is not set
 # end of Memory Debugging
 
 CONFIG_DEBUG_SHIRQ=y

-- 
https://patchwork.kernel.org/project/linux-wireless/list/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

  reply	other threads:[~2021-03-09 16:44 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-04 14:59 [regression] mhi: mhi_pm_st_worker blocked for more than 61 seconds Kalle Valo
2021-03-04 14:59 ` Kalle Valo
2021-03-04 15:10 ` Manivannan Sadhasivam
2021-03-04 15:10   ` Manivannan Sadhasivam
2021-03-04 17:21   ` Kalle Valo
2021-03-04 17:21     ` Kalle Valo
2021-03-09 16:44     ` Kalle Valo [this message]
2021-03-09 16:44       ` Kalle Valo
2021-03-09 18:07       ` Bhaumik Bhatt
2021-03-09 18:07         ` Bhaumik Bhatt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87k0qgz38r.fsf@codeaurora.org \
    --to=kvalo@codeaurora.org \
    --cc=ath11k@lists.infradead.org \
    --cc=bbhatt@codeaurora.org \
    --cc=hemantk@codeaurora.org \
    --cc=linux-arm-msm@vger.kernel.org \
    --cc=manivannan.sadhasivam@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.