[PATCH v3] bus: mhi: host: don't free bhie tables during suspend/hibernation

linux-arm-msm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v3] bus: mhi: host: don't free bhie tables during suspend/hibernation
@ 2025-04-29 12:20 Muhammad Usama Anjum
  2025-05-01 16:00 ` Greg Kroah-Hartman
  0 siblings, 1 reply; 8+ messages in thread
From: Muhammad Usama Anjum @ 2025-04-29 12:20 UTC (permalink / raw)
  To: Manivannan Sadhasivam, Jeff Johnson, Jeff Hugo, Youssef Samir,
	Matthew Leung, Muhammad Usama Anjum, Yan Zhen, Alex Elder,
	Jacek Lawrynowicz, Kunwu Chan, Greg Kroah-Hartman, Troy Hanson,
	Dr. David Alan Gilbert
  Cc: kernel, mhi, linux-arm-msm, linux-kernel, linux-wireless, ath11k,
	ath12k

Fix dma_direct_alloc() failure at resume time during bhie_table
allocation. There is a crash report where at resume time, the memory
from the dma doesn't get allocated and MHI fails to re-initialize.
There is fragmentation/memory pressure.

To fix it, don't free the memory at power down during suspend /
hibernation. Instead, use the same allocated memory again after every
resume / hibernation. This patch has been tested with resume and
hibernation both.

The rddm is of constant size for a given hardware. While the fbc_image
size depends on the firmware. If the firmware changes, we'll free and
allocate new memory for it.

Here are the crash logs:

[ 3029.338587] mhi mhi0: Requested to power ON
[ 3029.338621] mhi mhi0: Power on setup success
[ 3029.668654] kworker/u33:8: page allocation failure: order:7, mode:0xc04(GFP_NOIO|GFP_DMA32), nodemask=(null),cpuset=/,mems_allowed=0
[ 3029.668682] CPU: 4 UID: 0 PID: 2744 Comm: kworker/u33:8 Not tainted 6.11.11-valve10-1-neptune-611-gb69e902b4338 #1ed779c892334112fb968aaa3facf9686b5ff0bd7
[ 3029.668690] Hardware name: Valve Galileo/Galileo, BIOS F7G0112 08/01/2024
[ 3029.668694] Workqueue: mhi_hiprio_wq mhi_pm_st_worker [mhi]
[ 3029.668717] Call Trace:
[ 3029.668722]  <TASK>
[ 3029.668728]  dump_stack_lvl+0x4e/0x70
[ 3029.668738]  warn_alloc+0x164/0x190
[ 3029.668747]  ? srso_return_thunk+0x5/0x5f
[ 3029.668754]  ? __alloc_pages_direct_compact+0xaf/0x360
[ 3029.668761]  __alloc_pages_slowpath.constprop.0+0xc75/0xd70
[ 3029.668774]  __alloc_pages_noprof+0x321/0x350
[ 3029.668782]  __dma_direct_alloc_pages.isra.0+0x14a/0x290
[ 3029.668790]  dma_direct_alloc+0x70/0x270
[ 3029.668796]  mhi_alloc_bhie_table+0xe8/0x190 [mhi faa917c5aa23a5f5b12d6a2c597067e16d2fedc0]
[ 3029.668814]  mhi_fw_load_handler+0x1bc/0x310 [mhi faa917c5aa23a5f5b12d6a2c597067e16d2fedc0]
[ 3029.668830]  mhi_pm_st_worker+0x5c8/0xaa0 [mhi faa917c5aa23a5f5b12d6a2c597067e16d2fedc0]
[ 3029.668844]  ? srso_return_thunk+0x5/0x5f
[ 3029.668853]  process_one_work+0x17e/0x330
[ 3029.668861]  worker_thread+0x2ce/0x3f0
[ 3029.668868]  ? __pfx_worker_thread+0x10/0x10
[ 3029.668873]  kthread+0xd2/0x100
[ 3029.668879]  ? __pfx_kthread+0x10/0x10
[ 3029.668885]  ret_from_fork+0x34/0x50
[ 3029.668892]  ? __pfx_kthread+0x10/0x10
[ 3029.668898]  ret_from_fork_asm+0x1a/0x30
[ 3029.668910]  </TASK>

Tested-on: WCN6855 WLAN.HSP.1.1-03926.13-QCAHSPSWPL_V2_SILICONZ_CE-2.52297.6

Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
---
Changes since v1:
- Don't free bhie tables during suspend/hibernation only
- Handle fbc_image changed size correctly
- Remove fbc_image getting set to NULL in *free_bhie_table()

Changes since v2:
- Remove the new mhi_partial_unprepare_after_power_down() and instead
  update mhi_power_down_keep_dev() to use
  mhi_power_down_unprepare_keep_dev() as suggested by Mani
- Update all users of this API such as ath12k (previously only ath11k
  was updated)
- Define prev_fw_sz in docs
- Do better alignment of comments

Tested on ath11k.
---
 drivers/bus/mhi/host/boot.c           | 15 +++++++++++----
 drivers/bus/mhi/host/init.c           |  5 +++--
 drivers/bus/mhi/host/pm.c             |  9 +++++++++
 drivers/net/wireless/ath/ath11k/mhi.c |  8 ++++----
 drivers/net/wireless/ath/ath12k/mhi.c |  8 ++++----
 include/linux/mhi.h                   |  2 ++
 6 files changed, 33 insertions(+), 14 deletions(-)

diff --git a/drivers/bus/mhi/host/boot.c b/drivers/bus/mhi/host/boot.c
index efa3b6dddf4d2..bc8459798bbee 100644
--- a/drivers/bus/mhi/host/boot.c
+++ b/drivers/bus/mhi/host/boot.c
@@ -584,10 +584,17 @@ void mhi_fw_load_handler(struct mhi_controller *mhi_cntrl)
 	 * device transitioning into MHI READY state
 	 */
 	if (fw_load_type == MHI_FW_LOAD_FBC) {
-		ret = mhi_alloc_bhie_table(mhi_cntrl, &mhi_cntrl->fbc_image, fw_sz);
-		if (ret) {
-			release_firmware(firmware);
-			goto error_fw_load;
+		if (mhi_cntrl->fbc_image && fw_sz != mhi_cntrl->prev_fw_sz) {
+			mhi_free_bhie_table(mhi_cntrl, mhi_cntrl->fbc_image);
+			mhi_cntrl->fbc_image = NULL;
+		}
+		if (!mhi_cntrl->fbc_image) {
+			ret = mhi_alloc_bhie_table(mhi_cntrl, &mhi_cntrl->fbc_image, fw_sz);
+			if (ret) {
+				release_firmware(firmware);
+				goto error_fw_load;
+			}
+			mhi_cntrl->prev_fw_sz = fw_sz;
 		}
 
 		/* Load the firmware into BHIE vec table */
diff --git a/drivers/bus/mhi/host/init.c b/drivers/bus/mhi/host/init.c
index 13e7a55f54ff4..a7663ad16bfc6 100644
--- a/drivers/bus/mhi/host/init.c
+++ b/drivers/bus/mhi/host/init.c
@@ -1173,8 +1173,9 @@ int mhi_prepare_for_power_up(struct mhi_controller *mhi_cntrl)
 		/*
 		 * Allocate RDDM table for debugging purpose if specified
 		 */
-		mhi_alloc_bhie_table(mhi_cntrl, &mhi_cntrl->rddm_image,
-				     mhi_cntrl->rddm_size);
+		if (!mhi_cntrl->rddm_image)
+			mhi_alloc_bhie_table(mhi_cntrl, &mhi_cntrl->rddm_image,
+					     mhi_cntrl->rddm_size);
 		if (mhi_cntrl->rddm_image) {
 			ret = mhi_rddm_prepare(mhi_cntrl,
 					       mhi_cntrl->rddm_image);
diff --git a/drivers/bus/mhi/host/pm.c b/drivers/bus/mhi/host/pm.c
index e6c3ff62bab1d..b726b000d8a5d 100644
--- a/drivers/bus/mhi/host/pm.c
+++ b/drivers/bus/mhi/host/pm.c
@@ -1259,10 +1259,19 @@ void mhi_power_down(struct mhi_controller *mhi_cntrl, bool graceful)
 }
 EXPORT_SYMBOL_GPL(mhi_power_down);
 
+void mhi_power_down_unprepare_keep_dev(struct mhi_controller *mhi_cntrl)
+{
+	mhi_cntrl->bhi = NULL;
+	mhi_cntrl->bhie = NULL;
+
+	mhi_deinit_dev_ctxt(mhi_cntrl);
+}
+
 void mhi_power_down_keep_dev(struct mhi_controller *mhi_cntrl,
 			       bool graceful)
 {
 	__mhi_power_down(mhi_cntrl, graceful, false);
+	mhi_power_down_unprepare_keep_dev(mhi_cntrl);
 }
 EXPORT_SYMBOL_GPL(mhi_power_down_keep_dev);
 
diff --git a/drivers/net/wireless/ath/ath11k/mhi.c b/drivers/net/wireless/ath/ath11k/mhi.c
index acd76e9392d31..c5dc776b23643 100644
--- a/drivers/net/wireless/ath/ath11k/mhi.c
+++ b/drivers/net/wireless/ath/ath11k/mhi.c
@@ -460,12 +460,12 @@ void ath11k_mhi_stop(struct ath11k_pci *ab_pci, bool is_suspend)
 	 * workaround, otherwise ath11k_core_resume() will timeout
 	 * during resume.
 	 */
-	if (is_suspend)
+	if (is_suspend) {
 		mhi_power_down_keep_dev(ab_pci->mhi_ctrl, true);
-	else
+	} else {
 		mhi_power_down(ab_pci->mhi_ctrl, true);
-
-	mhi_unprepare_after_power_down(ab_pci->mhi_ctrl);
+		mhi_unprepare_after_power_down(ab_pci->mhi_ctrl);
+	}
 }
 
 int ath11k_mhi_suspend(struct ath11k_pci *ab_pci)
diff --git a/drivers/net/wireless/ath/ath12k/mhi.c b/drivers/net/wireless/ath/ath12k/mhi.c
index 08f44baf182a5..cb7f789d873f2 100644
--- a/drivers/net/wireless/ath/ath12k/mhi.c
+++ b/drivers/net/wireless/ath/ath12k/mhi.c
@@ -635,12 +635,12 @@ void ath12k_mhi_stop(struct ath12k_pci *ab_pci, bool is_suspend)
 	 * workaround, otherwise ath12k_core_resume() will timeout
 	 * during resume.
 	 */
-	if (is_suspend)
+	if (is_suspend) {
 		ath12k_mhi_set_state(ab_pci, ATH12K_MHI_POWER_OFF_KEEP_DEV);
-	else
+	} else {
 		ath12k_mhi_set_state(ab_pci, ATH12K_MHI_POWER_OFF);
-
-	ath12k_mhi_set_state(ab_pci, ATH12K_MHI_DEINIT);
+		ath12k_mhi_set_state(ab_pci, ATH12K_MHI_DEINIT);
+	}
 }
 
 void ath12k_mhi_suspend(struct ath12k_pci *ab_pci)
diff --git a/include/linux/mhi.h b/include/linux/mhi.h
index dd372b0123a6d..6fd218a877855 100644
--- a/include/linux/mhi.h
+++ b/include/linux/mhi.h
@@ -306,6 +306,7 @@ struct mhi_controller_config {
  *           if fw_image is NULL and fbc_download is true (optional)
  * @fw_sz: Firmware image data size for normal booting, used only if fw_image
  *         is NULL and fbc_download is true (optional)
+ * @prev_fw_sz: Previous firmware image data size, when fbc_download is true
  * @edl_image: Firmware image name for emergency download mode (optional)
  * @rddm_size: RAM dump size that host should allocate for debugging purpose
  * @sbl_size: SBL image size downloaded through BHIe (optional)
@@ -382,6 +383,7 @@ struct mhi_controller {
 	const char *fw_image;
 	const u8 *fw_data;
 	size_t fw_sz;
+	size_t prev_fw_sz;
 	const char *edl_image;
 	size_t rddm_size;
 	size_t sbl_size;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v3] bus: mhi: host: don't free bhie tables during suspend/hibernation
@ 2025-04-29 12:23 Muhammad Usama Anjum
  2025-04-29 20:55 ` Sebastian Reichel
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Muhammad Usama Anjum @ 2025-04-29 12:23 UTC (permalink / raw)
  To: Manivannan Sadhasivam, Jeff Johnson, Jeff Hugo, Youssef Samir,
	Matthew Leung, Muhammad Usama Anjum, Yan Zhen, Kunwu Chan,
	Greg Kroah-Hartman, Dr. David Alan Gilbert, Troy Hanson
  Cc: kernel, Carl Vanderlip, Sumit Garg, mhi, linux-arm-msm,
	linux-kernel, linux-wireless, ath11k, ath12k

Fix dma_direct_alloc() failure at resume time during bhie_table
allocation. There is a crash report where at resume time, the memory
from the dma doesn't get allocated and MHI fails to re-initialize.
There is fragmentation/memory pressure.

To fix it, don't free the memory at power down during suspend /
hibernation. Instead, use the same allocated memory again after every
resume / hibernation. This patch has been tested with resume and
hibernation both.

The rddm is of constant size for a given hardware. While the fbc_image
size depends on the firmware. If the firmware changes, we'll free and
allocate new memory for it.

Here are the crash logs:

[ 3029.338587] mhi mhi0: Requested to power ON
[ 3029.338621] mhi mhi0: Power on setup success
[ 3029.668654] kworker/u33:8: page allocation failure: order:7, mode:0xc04(GFP_NOIO|GFP_DMA32), nodemask=(null),cpuset=/,mems_allowed=0
[ 3029.668682] CPU: 4 UID: 0 PID: 2744 Comm: kworker/u33:8 Not tainted 6.11.11-valve10-1-neptune-611-gb69e902b4338 #1ed779c892334112fb968aaa3facf9686b5ff0bd7
[ 3029.668690] Hardware name: Valve Galileo/Galileo, BIOS F7G0112 08/01/2024
[ 3029.668694] Workqueue: mhi_hiprio_wq mhi_pm_st_worker [mhi]
[ 3029.668717] Call Trace:
[ 3029.668722]  <TASK>
[ 3029.668728]  dump_stack_lvl+0x4e/0x70
[ 3029.668738]  warn_alloc+0x164/0x190
[ 3029.668747]  ? srso_return_thunk+0x5/0x5f
[ 3029.668754]  ? __alloc_pages_direct_compact+0xaf/0x360
[ 3029.668761]  __alloc_pages_slowpath.constprop.0+0xc75/0xd70
[ 3029.668774]  __alloc_pages_noprof+0x321/0x350
[ 3029.668782]  __dma_direct_alloc_pages.isra.0+0x14a/0x290
[ 3029.668790]  dma_direct_alloc+0x70/0x270
[ 3029.668796]  mhi_alloc_bhie_table+0xe8/0x190 [mhi faa917c5aa23a5f5b12d6a2c597067e16d2fedc0]
[ 3029.668814]  mhi_fw_load_handler+0x1bc/0x310 [mhi faa917c5aa23a5f5b12d6a2c597067e16d2fedc0]
[ 3029.668830]  mhi_pm_st_worker+0x5c8/0xaa0 [mhi faa917c5aa23a5f5b12d6a2c597067e16d2fedc0]
[ 3029.668844]  ? srso_return_thunk+0x5/0x5f
[ 3029.668853]  process_one_work+0x17e/0x330
[ 3029.668861]  worker_thread+0x2ce/0x3f0
[ 3029.668868]  ? __pfx_worker_thread+0x10/0x10
[ 3029.668873]  kthread+0xd2/0x100
[ 3029.668879]  ? __pfx_kthread+0x10/0x10
[ 3029.668885]  ret_from_fork+0x34/0x50
[ 3029.668892]  ? __pfx_kthread+0x10/0x10
[ 3029.668898]  ret_from_fork_asm+0x1a/0x30
[ 3029.668910]  </TASK>

Tested-on: WCN6855 WLAN.HSP.1.1-03926.13-QCAHSPSWPL_V2_SILICONZ_CE-2.52297.6

Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
---
Changes since v1:
- Don't free bhie tables during suspend/hibernation only
- Handle fbc_image changed size correctly
- Remove fbc_image getting set to NULL in *free_bhie_table()

Changes since v2:
- Remove the new mhi_partial_unprepare_after_power_down() and instead
  update mhi_power_down_keep_dev() to use
  mhi_power_down_unprepare_keep_dev() as suggested by Mani
- Update all users of this API such as ath12k (previously only ath11k
  was updated)
- Define prev_fw_sz in docs
- Do better alignment of comments

Tested on ath11k.
---
 drivers/bus/mhi/host/boot.c           | 15 +++++++++++----
 drivers/bus/mhi/host/init.c           |  5 +++--
 drivers/bus/mhi/host/pm.c             |  9 +++++++++
 drivers/net/wireless/ath/ath11k/mhi.c |  8 ++++----
 drivers/net/wireless/ath/ath12k/mhi.c |  8 ++++----
 include/linux/mhi.h                   |  2 ++
 6 files changed, 33 insertions(+), 14 deletions(-)

diff --git a/drivers/bus/mhi/host/boot.c b/drivers/bus/mhi/host/boot.c
index efa3b6dddf4d2..bc8459798bbee 100644
--- a/drivers/bus/mhi/host/boot.c
+++ b/drivers/bus/mhi/host/boot.c
@@ -584,10 +584,17 @@ void mhi_fw_load_handler(struct mhi_controller *mhi_cntrl)
 	 * device transitioning into MHI READY state
 	 */
 	if (fw_load_type == MHI_FW_LOAD_FBC) {
-		ret = mhi_alloc_bhie_table(mhi_cntrl, &mhi_cntrl->fbc_image, fw_sz);
-		if (ret) {
-			release_firmware(firmware);
-			goto error_fw_load;
+		if (mhi_cntrl->fbc_image && fw_sz != mhi_cntrl->prev_fw_sz) {
+			mhi_free_bhie_table(mhi_cntrl, mhi_cntrl->fbc_image);
+			mhi_cntrl->fbc_image = NULL;
+		}
+		if (!mhi_cntrl->fbc_image) {
+			ret = mhi_alloc_bhie_table(mhi_cntrl, &mhi_cntrl->fbc_image, fw_sz);
+			if (ret) {
+				release_firmware(firmware);
+				goto error_fw_load;
+			}
+			mhi_cntrl->prev_fw_sz = fw_sz;
 		}
 
 		/* Load the firmware into BHIE vec table */
diff --git a/drivers/bus/mhi/host/init.c b/drivers/bus/mhi/host/init.c
index 13e7a55f54ff4..a7663ad16bfc6 100644
--- a/drivers/bus/mhi/host/init.c
+++ b/drivers/bus/mhi/host/init.c
@@ -1173,8 +1173,9 @@ int mhi_prepare_for_power_up(struct mhi_controller *mhi_cntrl)
 		/*
 		 * Allocate RDDM table for debugging purpose if specified
 		 */
-		mhi_alloc_bhie_table(mhi_cntrl, &mhi_cntrl->rddm_image,
-				     mhi_cntrl->rddm_size);
+		if (!mhi_cntrl->rddm_image)
+			mhi_alloc_bhie_table(mhi_cntrl, &mhi_cntrl->rddm_image,
+					     mhi_cntrl->rddm_size);
 		if (mhi_cntrl->rddm_image) {
 			ret = mhi_rddm_prepare(mhi_cntrl,
 					       mhi_cntrl->rddm_image);
diff --git a/drivers/bus/mhi/host/pm.c b/drivers/bus/mhi/host/pm.c
index e6c3ff62bab1d..b726b000d8a5d 100644
--- a/drivers/bus/mhi/host/pm.c
+++ b/drivers/bus/mhi/host/pm.c
@@ -1259,10 +1259,19 @@ void mhi_power_down(struct mhi_controller *mhi_cntrl, bool graceful)
 }
 EXPORT_SYMBOL_GPL(mhi_power_down);
 
+void mhi_power_down_unprepare_keep_dev(struct mhi_controller *mhi_cntrl)
+{
+	mhi_cntrl->bhi = NULL;
+	mhi_cntrl->bhie = NULL;
+
+	mhi_deinit_dev_ctxt(mhi_cntrl);
+}
+
 void mhi_power_down_keep_dev(struct mhi_controller *mhi_cntrl,
 			       bool graceful)
 {
 	__mhi_power_down(mhi_cntrl, graceful, false);
+	mhi_power_down_unprepare_keep_dev(mhi_cntrl);
 }
 EXPORT_SYMBOL_GPL(mhi_power_down_keep_dev);
 
diff --git a/drivers/net/wireless/ath/ath11k/mhi.c b/drivers/net/wireless/ath/ath11k/mhi.c
index acd76e9392d31..c5dc776b23643 100644
--- a/drivers/net/wireless/ath/ath11k/mhi.c
+++ b/drivers/net/wireless/ath/ath11k/mhi.c
@@ -460,12 +460,12 @@ void ath11k_mhi_stop(struct ath11k_pci *ab_pci, bool is_suspend)
 	 * workaround, otherwise ath11k_core_resume() will timeout
 	 * during resume.
 	 */
-	if (is_suspend)
+	if (is_suspend) {
 		mhi_power_down_keep_dev(ab_pci->mhi_ctrl, true);
-	else
+	} else {
 		mhi_power_down(ab_pci->mhi_ctrl, true);
-
-	mhi_unprepare_after_power_down(ab_pci->mhi_ctrl);
+		mhi_unprepare_after_power_down(ab_pci->mhi_ctrl);
+	}
 }
 
 int ath11k_mhi_suspend(struct ath11k_pci *ab_pci)
diff --git a/drivers/net/wireless/ath/ath12k/mhi.c b/drivers/net/wireless/ath/ath12k/mhi.c
index 08f44baf182a5..cb7f789d873f2 100644
--- a/drivers/net/wireless/ath/ath12k/mhi.c
+++ b/drivers/net/wireless/ath/ath12k/mhi.c
@@ -635,12 +635,12 @@ void ath12k_mhi_stop(struct ath12k_pci *ab_pci, bool is_suspend)
 	 * workaround, otherwise ath12k_core_resume() will timeout
 	 * during resume.
 	 */
-	if (is_suspend)
+	if (is_suspend) {
 		ath12k_mhi_set_state(ab_pci, ATH12K_MHI_POWER_OFF_KEEP_DEV);
-	else
+	} else {
 		ath12k_mhi_set_state(ab_pci, ATH12K_MHI_POWER_OFF);
-
-	ath12k_mhi_set_state(ab_pci, ATH12K_MHI_DEINIT);
+		ath12k_mhi_set_state(ab_pci, ATH12K_MHI_DEINIT);
+	}
 }
 
 void ath12k_mhi_suspend(struct ath12k_pci *ab_pci)
diff --git a/include/linux/mhi.h b/include/linux/mhi.h
index dd372b0123a6d..6fd218a877855 100644
--- a/include/linux/mhi.h
+++ b/include/linux/mhi.h
@@ -306,6 +306,7 @@ struct mhi_controller_config {
  *           if fw_image is NULL and fbc_download is true (optional)
  * @fw_sz: Firmware image data size for normal booting, used only if fw_image
  *         is NULL and fbc_download is true (optional)
+ * @prev_fw_sz: Previous firmware image data size, when fbc_download is true
  * @edl_image: Firmware image name for emergency download mode (optional)
  * @rddm_size: RAM dump size that host should allocate for debugging purpose
  * @sbl_size: SBL image size downloaded through BHIe (optional)
@@ -382,6 +383,7 @@ struct mhi_controller {
 	const char *fw_image;
 	const u8 *fw_data;
 	size_t fw_sz;
+	size_t prev_fw_sz;
 	const char *edl_image;
 	size_t rddm_size;
 	size_t sbl_size;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v3] bus: mhi: host: don't free bhie tables during suspend/hibernation
  2025-04-29 12:23 [PATCH v3] bus: mhi: host: don't free bhie tables during suspend/hibernation Muhammad Usama Anjum
@ 2025-04-29 20:55 ` Sebastian Reichel
  2025-04-30 14:29 ` kernel test robot
  2025-04-30 16:45 ` kernel test robot
  2 siblings, 0 replies; 8+ messages in thread
From: Sebastian Reichel @ 2025-04-29 20:55 UTC (permalink / raw)
  To: Muhammad Usama Anjum
  Cc: Manivannan Sadhasivam, Jeff Johnson, Jeff Hugo, Youssef Samir,
	Matthew Leung, Yan Zhen, Kunwu Chan, Greg Kroah-Hartman,
	Dr. David Alan Gilbert, Troy Hanson, kernel, Carl Vanderlip,
	Sumit Garg, mhi, linux-arm-msm, linux-kernel, linux-wireless,
	ath11k, ath12k

[-- Attachment #1: Type: text/plain, Size: 13820 bytes --]

Hi,

On Tue, Apr 29, 2025 at 05:23:35PM +0500, Muhammad Usama Anjum wrote:
> Fix dma_direct_alloc() failure at resume time during bhie_table
> allocation. There is a crash report where at resume time, the memory
> from the dma doesn't get allocated and MHI fails to re-initialize.
> There is fragmentation/memory pressure.
> 
> To fix it, don't free the memory at power down during suspend /
> hibernation. Instead, use the same allocated memory again after every
> resume / hibernation. This patch has been tested with resume and
> hibernation both.
> 
> The rddm is of constant size for a given hardware. While the fbc_image
> size depends on the firmware. If the firmware changes, we'll free and
> allocate new memory for it.
> 
> Here are the crash logs:
> 
> [ 3029.338587] mhi mhi0: Requested to power ON
> [ 3029.338621] mhi mhi0: Power on setup success
> [ 3029.668654] kworker/u33:8: page allocation failure: order:7, mode:0xc04(GFP_NOIO|GFP_DMA32), nodemask=(null),cpuset=/,mems_allowed=0
> [ 3029.668682] CPU: 4 UID: 0 PID: 2744 Comm: kworker/u33:8 Not tainted 6.11.11-valve10-1-neptune-611-gb69e902b4338 #1ed779c892334112fb968aaa3facf9686b5ff0bd7
> [ 3029.668690] Hardware name: Valve Galileo/Galileo, BIOS F7G0112 08/01/2024
> [ 3029.668694] Workqueue: mhi_hiprio_wq mhi_pm_st_worker [mhi]
> [ 3029.668717] Call Trace:
> [ 3029.668722]  <TASK>
> [ 3029.668728]  dump_stack_lvl+0x4e/0x70
> [ 3029.668738]  warn_alloc+0x164/0x190
> [ 3029.668747]  ? srso_return_thunk+0x5/0x5f
> [ 3029.668754]  ? __alloc_pages_direct_compact+0xaf/0x360
> [ 3029.668761]  __alloc_pages_slowpath.constprop.0+0xc75/0xd70
> [ 3029.668774]  __alloc_pages_noprof+0x321/0x350
> [ 3029.668782]  __dma_direct_alloc_pages.isra.0+0x14a/0x290
> [ 3029.668790]  dma_direct_alloc+0x70/0x270
> [ 3029.668796]  mhi_alloc_bhie_table+0xe8/0x190 [mhi faa917c5aa23a5f5b12d6a2c597067e16d2fedc0]
> [ 3029.668814]  mhi_fw_load_handler+0x1bc/0x310 [mhi faa917c5aa23a5f5b12d6a2c597067e16d2fedc0]
> [ 3029.668830]  mhi_pm_st_worker+0x5c8/0xaa0 [mhi faa917c5aa23a5f5b12d6a2c597067e16d2fedc0]
> [ 3029.668844]  ? srso_return_thunk+0x5/0x5f
> [ 3029.668853]  process_one_work+0x17e/0x330
> [ 3029.668861]  worker_thread+0x2ce/0x3f0
> [ 3029.668868]  ? __pfx_worker_thread+0x10/0x10
> [ 3029.668873]  kthread+0xd2/0x100
> [ 3029.668879]  ? __pfx_kthread+0x10/0x10
> [ 3029.668885]  ret_from_fork+0x34/0x50
> [ 3029.668892]  ? __pfx_kthread+0x10/0x10
> [ 3029.668898]  ret_from_fork_asm+0x1a/0x30
> [ 3029.668910]  </TASK>
> 
> Tested-on: WCN6855 WLAN.HSP.1.1-03926.13-QCAHSPSWPL_V2_SILICONZ_CE-2.52297.6
> 
> Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
> ---

This breaks ath12k on my T14s Snapdragon with WCN785x. After a
suspend/resume cycle the following is in my logs (and the resume
is super slow). Additionally at shutdown ath12k crashes with a
NULL pointer dereference in mhi_deinit_dev_ctxt, which got called
by mhi_unprepare_after_power_down, which got called by ath12k_mhi_stop.
This happens after filesystem umount and I don't have anything
configured right now to get logs from that point, so it is not
included in the log from the suspend/resume cycle down below:

...
[   28.385370] ath12k_pci 0004:01:00.0: failed to set mhi state INIT(0) in current mhi state (0x1)
[   28.385379] ath12k_pci 0004:01:00.0: failed to set mhi state: INIT(0)
[   28.385383] ath12k_pci 0004:01:00.0: failed to start mhi: -22
[   28.385387] ath12k_pci 0004:01:00.0: failed to power up hif during resume: -22
[   28.385391] ath12k_pci 0004:01:00.0: failed to early resume core: -22
[   28.385393] ath12k_pci 0004:01:00.0: PM: dpm_run_callback(): pci_pm_resume_early returns -22
[   28.385413] ath12k_pci 0004:01:00.0: PM: failed to resume async early: error -22
[   28.385513] qcom_mhi_qrtr mhi0_IPCR: Current EE: DISABLE Required EE Mask: 0x4
[   28.385521] qcom_mhi_qrtr mhi0_IPCR: failed to prepare for autoqueue transfer -107
[   28.385526] qcom_mhi_qrtr mhi0_IPCR: PM: dpm_run_callback(): qcom_mhi_qrtr_pm_resume_early [qrtr_mhi] returns -107
[   28.385541] qcom_mhi_qrtr mhi0_IPCR: PM: failed to resume early: error -107
[   50.146823] ath12k_pci 0004:01:00.0: timeout while waiting for restart complete
[   50.146830] ath12k_pci 0004:01:00.0: failed to resume core: -110
[   50.146834] ath12k_pci 0004:01:00.0: PM: dpm_run_callback(): pci_pm_resume returns -110
[   50.146849] ath12k_pci 0004:01:00.0: PM: failed to resume async: error -110
[   53.218794] ath12k_pci 0004:01:00.0: wmi command 16387 timeout
[   53.218801] ath12k_pci 0004:01:00.0: failed to send WMI_PDEV_SET_PARAM cmd
[   53.218808] ath12k_pci 0004:01:00.0: failed to set ac override for ARP: -11
[   53.218813] ath12k_pci 0004:01:00.0: fail to start mac operations in pdev idx 0 ret -11
[   53.218817] ------------[ cut here ]------------
[   53.218820] Hardware became unavailable upon resume. This could be a software issue prior to suspend or a hardware issue.
[   53.218855] WARNING: CPU: 2 PID: 1958 at net/mac80211/util.c:1829 ieee80211_reconfig+0x37c/0x1718 [mac80211]
[   53.218936] Modules linked in: reset_gpio snd_soc_wsa884x q6prm_clocks q6apm_dai q6apm_lpass_dais snd_q6dsp_common q6prm michael_mic rfcomm wireguard libchacha20poly1305 chacha_neon libchacha poly1305_neon ip6_udp_tunnel udp_tunnel libcurve25519_generic binfmt_misc qrtr_mhi ath12k mac80211 libarc4 cfg80211 mhi hci_uart btqca btbcm snd_soc_x1e80100 snd_soc_qcom_sdw snd_soc_qcom_common bluetooth ecdh_generic ecc qcom_spmi_temp_alarm rfkill snd_q6apm snd_soc_hdmi_codec fastrpc snd_soc_lpass_va_macro snd_soc_lpass_tx_macro snd_soc_lpass_rx_macro snd_soc_lpass_wsa_macro soundwire_qcom snd_soc_wcd938x slimbus snd_soc_lpass_macro_common snd_soc_wcd938x_sdw pci_pwrctrl_pwrseq regmap_sdw snd_soc_wcd_mbhc coresight_stm coresight_funnel coresight_tmc snd_soc_wcd_classh coresight_cti stm_core coresight_replicator soundwire_bus coresight mux_gpio fuse nfnetlink ip_tables x_tables ipv6 gpio_sbu_mux panel_edp msm hid_multitouch drm_exec ocmem gpu_sched drm_dp_aux_bus rpmsg_ctrl apr rpmsg_char qrtr_smd i2c_hid_of qcom_pd_mapper
[   53.219100]  ps883x phy_nxp_ptn3222 i2c_hid drm_display_helper nvme phy_qcom_qmp_combo leds_qcom_lpg ucsi_glink pmic_glink_altmode nvme_core aux_hpd_bridge typec_ucsi qcom_battmgr sm3_ce sm3 led_class_multicolor qcom_q6v5_pas sha3_ce rtc_pm8xxx phy_qcom_eusb2_repeater qcom_pbs drm_client_lib aux_bridge sha512_ce qcom_pil_info drm_kms_helper qcom_common qcom_pon sha512_arm64 qcom_glink_smem typec qcom_q6v5 nvmem_qcom_spmi_sdam dispcc_x1e80100 drm pwrseq_qcom_wcn pinctrl_sm8550_lpass_lpi pwrseq_core i2c_qcom_geni qcom_stats pinctrl_lpass_lpi phy_qcom_edp phy_qcom_qmp_usb qcom_sysmon tcsrcc_x1e80100 llcc_qcom gpucc_x1e80100 phy_qcom_snps_eusb2 mdt_loader lpasscc_sc8280xp pcie_qcom qcom_cpucp_mbox icc_bwmon phy_qcom_qmp_pcie qrtr pmic_glink pdr_interface qcom_pdr_msg pwm_bl socinfo backlight qmi_helpers
[   53.219234] CPU: 2 UID: 0 PID: 1958 Comm: kworker/u49:49 Not tainted 6.15.0-rc4+ #95 PREEMPT 
[   53.219241] Hardware name: LENOVO 21N1CTO1WW/21N1CTO1WW, BIOS N42ET85W (2.15 ) 11/22/2024
[   53.219245] Workqueue: async async_run_entry_fn
[   53.219258] pstate: 61400005 (nZCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
[   53.219265] pc : ieee80211_reconfig+0x37c/0x1718 [mac80211]
[   53.219315] lr : ieee80211_reconfig+0x37c/0x1718 [mac80211]
[   53.219362] sp : ffff8000853ebb30
[   53.219364] x29: ffff8000853ebbf0 x28: 0000000000000000 x27: 0000000000000000
[   53.219373] x26: ffff1ce140047428 x25: 0000000000000000 x24: ffff1ce1408f7c05
[   53.219380] x23: ffff1ce14aaa05b8 x22: 0000000000000010 x21: 00000000fffffff5
[   53.219387] x20: 0000000000000000 x19: ffff1ce14aaa0900 x18: 00000000fffffffe
[   53.219394] x17: 72617774666f7320 x16: 6120656220646c75 x15: 6f63207369685420
[   53.219401] x14: 2e656d7573657220 x13: 0a2e657573736920 x12: 6572617764726168
[   53.219408] x11: 0000000000000058 x10: 0000000000000018 x9 : ffffdacf6aa7749c
[   53.219415] x8 : 0000000000000507 x7 : ffffdacf6d031138 x6 : ffffdacf6d031138
[   53.219422] x5 : ffff1ce8bbe76508 x4 : 0000000000000000 x3 : ffff42194efd1000
[   53.219429] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff1ce149caa300
[   53.219437] Call trace:
[   53.219440]  ieee80211_reconfig+0x37c/0x1718 [mac80211] (P)
[   53.219490]  ieee80211_resume+0x54/0x78 [mac80211]
[   53.219541]  wiphy_resume+0x8c/0x200 [cfg80211]
[   53.219603]  dpm_run_callback+0x50/0x188
[   53.219614]  device_resume+0xc4/0x1f8
[   53.219621]  async_resume+0x2c/0x50
[   53.219628]  async_run_entry_fn+0x3c/0x160
[   53.219634]  process_one_work+0x158/0x3c8
[   53.219643]  worker_thread+0x2e0/0x418
[   53.219650]  kthread+0x14c/0x230
[   53.219657]  ret_from_fork+0x10/0x20
[   53.219666] ---[ end trace 0000000000000000 ]---
[   53.220154] ------------[ cut here ]------------
[   53.220158] WARNING: CPU: 2 PID: 1958 at net/mac80211/driver-ops.c:41 drv_stop+0x1cc/0x1e8 [mac80211]
[   53.220235] Modules linked in: reset_gpio snd_soc_wsa884x q6prm_clocks q6apm_dai q6apm_lpass_dais snd_q6dsp_common q6prm michael_mic rfcomm wireguard libchacha20poly1305 chacha_neon libchacha poly1305_neon ip6_udp_tunnel udp_tunnel libcurve25519_generic binfmt_misc qrtr_mhi ath12k mac80211 libarc4 cfg80211 mhi hci_uart btqca btbcm snd_soc_x1e80100 snd_soc_qcom_sdw snd_soc_qcom_common bluetooth ecdh_generic ecc qcom_spmi_temp_alarm rfkill snd_q6apm snd_soc_hdmi_codec fastrpc snd_soc_lpass_va_macro snd_soc_lpass_tx_macro snd_soc_lpass_rx_macro snd_soc_lpass_wsa_macro soundwire_qcom snd_soc_wcd938x slimbus snd_soc_lpass_macro_common snd_soc_wcd938x_sdw pci_pwrctrl_pwrseq regmap_sdw snd_soc_wcd_mbhc coresight_stm coresight_funnel coresight_tmc snd_soc_wcd_classh coresight_cti stm_core coresight_replicator soundwire_bus coresight mux_gpio fuse nfnetlink ip_tables x_tables ipv6 gpio_sbu_mux panel_edp msm hid_multitouch drm_exec ocmem gpu_sched drm_dp_aux_bus rpmsg_ctrl apr rpmsg_char qrtr_smd i2c_hid_of qcom_pd_mapper
[   53.220351]  ps883x phy_nxp_ptn3222 i2c_hid drm_display_helper nvme phy_qcom_qmp_combo leds_qcom_lpg ucsi_glink pmic_glink_altmode nvme_core aux_hpd_bridge typec_ucsi qcom_battmgr sm3_ce sm3 led_class_multicolor qcom_q6v5_pas sha3_ce rtc_pm8xxx phy_qcom_eusb2_repeater qcom_pbs drm_client_lib aux_bridge sha512_ce qcom_pil_info drm_kms_helper qcom_common qcom_pon sha512_arm64 qcom_glink_smem typec qcom_q6v5 nvmem_qcom_spmi_sdam dispcc_x1e80100 drm pwrseq_qcom_wcn pinctrl_sm8550_lpass_lpi pwrseq_core i2c_qcom_geni qcom_stats pinctrl_lpass_lpi phy_qcom_edp phy_qcom_qmp_usb qcom_sysmon tcsrcc_x1e80100 llcc_qcom gpucc_x1e80100 phy_qcom_snps_eusb2 mdt_loader lpasscc_sc8280xp pcie_qcom qcom_cpucp_mbox icc_bwmon phy_qcom_qmp_pcie qrtr pmic_glink pdr_interface qcom_pdr_msg pwm_bl socinfo backlight qmi_helpers
[   53.220444] CPU: 2 UID: 0 PID: 1958 Comm: kworker/u49:49 Tainted: G        W           6.15.0-rc4+ #95 PREEMPT 
[   53.220452] Tainted: [W]=WARN
[   53.220455] Hardware name: LENOVO 21N1CTO1WW/21N1CTO1WW, BIOS N42ET85W (2.15 ) 11/22/2024
[   53.220458] Workqueue: async async_run_entry_fn
[   53.220467] pstate: 61400005 (nZCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
[   53.220472] pc : drv_stop+0x1cc/0x1e8 [mac80211]
[   53.220521] lr : ieee80211_stop_device+0x8c/0xa8 [mac80211]
[   53.220580] sp : ffff8000853eb9f0
[   53.220582] x29: ffff8000853eb9f0 x28: 0000000000000000 x27: 0000000000000000
[   53.220591] x26: ffff1ce140047428 x25: ffff8000853eba50 x24: ffff8000853eba50
[   53.220598] x23: 0000000000000000 x22: 0000000000000001 x21: 0000000000000000
[   53.220604] x20: 0000000000000000 x19: ffff1ce14aaa0900 x18: 00000000fffffffe
[   53.220611] x17: ffff42194efd1000 x16: ffff800080010000 x15: 6f63207369685420
[   53.220618] x14: 000000000000037f x13: 000000000000037f x12: 071c71c71c71c71c
[   53.220625] x11: ffff1ce8bbe88b8c x10: 1f0348adc6bb8584 x9 : ffffdacf67622b3c
[   53.220633] x8 : ffff1ce149e1e550 x7 : 0000000000000000 x6 : 000000000000003f
[   53.220640] x5 : 0000000000000040 x4 : 0000000000000000 x3 : 0000000000000003
[   53.220646] x2 : 0000000000000001 x1 : 0000000000000000 x0 : 0000000000000000
[   53.220652] Call trace:
[   53.220654]  drv_stop+0x1cc/0x1e8 [mac80211] (P)
[   53.220702]  ieee80211_stop_device+0x8c/0xa8 [mac80211]
[   53.220751]  ieee80211_do_stop+0x644/0x830 [mac80211]
[   53.220798]  ieee80211_stop+0x60/0x1b0 [mac80211]
[   53.220845]  __dev_close_many+0xbc/0x1f0
[   53.220857]  dev_close_many+0x94/0x160
[   53.220863]  netif_close+0x78/0xa0
[   53.220868]  dev_close+0x3c/0x70
[   53.220876]  cfg80211_shutdown_all_interfaces+0x4c/0x118 [cfg80211]
[   53.220935]  wiphy_resume+0xc0/0x200 [cfg80211]
[   53.220985]  dpm_run_callback+0x50/0x188
[   53.220992]  device_resume+0xc4/0x1f8
[   53.220999]  async_resume+0x2c/0x50
[   53.221006]  async_run_entry_fn+0x3c/0x160
[   53.221012]  process_one_work+0x158/0x3c8
[   53.221020]  worker_thread+0x2e0/0x418
[   53.221027]  kthread+0x14c/0x230
[   53.221033]  ret_from_fork+0x10/0x20
[   53.221039] ---[ end trace 0000000000000000 ]---
[   53.221223] ieee80211 phy0: PM: dpm_run_callback(): wiphy_resume [cfg80211] returns -11
[   53.221277] ieee80211 phy0: PM: failed to resume async: error -11
[   53.667179] OOM killer enabled.
[   53.667182] Restarting tasks ... done.
[   53.668270] random: crng reseeded on system resumption
[   53.668317] PM: suspend exit
[   56.804822] ath12k_pci 0004:01:00.0: wmi command 16387 timeout
[   56.804845] ath12k_pci 0004:01:00.0: failed to send WMI_PDEV_SET_PARAM cmd
[   56.804859] ath12k_pci 0004:01:00.0: failed to enable PMF QOS: (-11
[   56.804872] ath12k_pci 0004:01:00.0: fail to start mac operations in pdev idx 0 ret -11
...

-- Sebastian

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3] bus: mhi: host: don't free bhie tables during suspend/hibernation
  2025-04-29 12:23 [PATCH v3] bus: mhi: host: don't free bhie tables during suspend/hibernation Muhammad Usama Anjum
  2025-04-29 20:55 ` Sebastian Reichel
@ 2025-04-30 14:29 ` kernel test robot
  2025-04-30 16:45 ` kernel test robot
  2 siblings, 0 replies; 8+ messages in thread
From: kernel test robot @ 2025-04-30 14:29 UTC (permalink / raw)
  To: Muhammad Usama Anjum, Manivannan Sadhasivam, Jeff Johnson,
	Jeff Hugo, Youssef Samir, Matthew Leung, Yan Zhen, Kunwu Chan,
	Greg Kroah-Hartman, Dr. David Alan Gilbert, Troy Hanson
  Cc: oe-kbuild-all, kernel, Carl Vanderlip, Sumit Garg, mhi,
	linux-arm-msm, linux-kernel, linux-wireless, ath11k, ath12k

Hi Muhammad,

kernel test robot noticed the following build warnings:

[auto build test WARNING on ath/ath-next]
[also build test WARNING on next-20250430]
[cannot apply to mani-mhi/mhi-next char-misc/char-misc-testing char-misc/char-misc-next char-misc/char-misc-linus staging/staging-testing staging/staging-next staging/staging-linus usb/usb-testing usb/usb-next usb/usb-linus linus/master v6.15-rc4]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Muhammad-Usama-Anjum/bus-mhi-host-don-t-free-bhie-tables-during-suspend-hibernation/20250429-202649
base:   https://git.kernel.org/pub/scm/linux/kernel/git/ath/ath.git ath-next
patch link:    https://lore.kernel.org/r/20250429122351.108684-1-usama.anjum%40collabora.com
patch subject: [PATCH v3] bus: mhi: host: don't free bhie tables during suspend/hibernation
config: arm-randconfig-001-20250430 (https://download.01.org/0day-ci/archive/20250430/202504302208.7JSH4wb6-lkp@intel.com/config)
compiler: arm-linux-gnueabi-gcc (GCC) 10.5.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250430/202504302208.7JSH4wb6-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202504302208.7JSH4wb6-lkp@intel.com/

All warnings (new ones prefixed by >>):

>> drivers/bus/mhi/host/pm.c:1246:6: warning: no previous prototype for 'mhi_power_down_unprepare_keep_dev' [-Wmissing-prototypes]
    1246 | void mhi_power_down_unprepare_keep_dev(struct mhi_controller *mhi_cntrl)
         |      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


vim +/mhi_power_down_unprepare_keep_dev +1246 drivers/bus/mhi/host/pm.c

  1245	
> 1246	void mhi_power_down_unprepare_keep_dev(struct mhi_controller *mhi_cntrl)
  1247	{
  1248		mhi_cntrl->bhi = NULL;
  1249		mhi_cntrl->bhie = NULL;
  1250	
  1251		mhi_deinit_dev_ctxt(mhi_cntrl);
  1252	}
  1253	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3] bus: mhi: host: don't free bhie tables during suspend/hibernation
  2025-04-29 12:23 [PATCH v3] bus: mhi: host: don't free bhie tables during suspend/hibernation Muhammad Usama Anjum
  2025-04-29 20:55 ` Sebastian Reichel
  2025-04-30 14:29 ` kernel test robot
@ 2025-04-30 16:45 ` kernel test robot
  2 siblings, 0 replies; 8+ messages in thread
From: kernel test robot @ 2025-04-30 16:45 UTC (permalink / raw)
  To: Muhammad Usama Anjum, Manivannan Sadhasivam, Jeff Johnson,
	Jeff Hugo, Youssef Samir, Matthew Leung, Yan Zhen, Kunwu Chan,
	Greg Kroah-Hartman, Dr. David Alan Gilbert, Troy Hanson
  Cc: llvm, oe-kbuild-all, kernel, Carl Vanderlip, Sumit Garg, mhi,
	linux-arm-msm, linux-kernel, linux-wireless, ath11k, ath12k

Hi Muhammad,

kernel test robot noticed the following build warnings:

[auto build test WARNING on ath/ath-next]
[also build test WARNING on next-20250430]
[cannot apply to mani-mhi/mhi-next char-misc/char-misc-testing char-misc/char-misc-next char-misc/char-misc-linus staging/staging-testing staging/staging-next staging/staging-linus usb/usb-testing usb/usb-next usb/usb-linus linus/master v6.15-rc4]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Muhammad-Usama-Anjum/bus-mhi-host-don-t-free-bhie-tables-during-suspend-hibernation/20250429-202649
base:   https://git.kernel.org/pub/scm/linux/kernel/git/ath/ath.git ath-next
patch link:    https://lore.kernel.org/r/20250429122351.108684-1-usama.anjum%40collabora.com
patch subject: [PATCH v3] bus: mhi: host: don't free bhie tables during suspend/hibernation
config: arm64-randconfig-002-20250430 (https://download.01.org/0day-ci/archive/20250501/202505010037.1PMLamw8-lkp@intel.com/config)
compiler: clang version 21.0.0git (https://github.com/llvm/llvm-project f819f46284f2a79790038e1f6649172789734ae8)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250501/202505010037.1PMLamw8-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202505010037.1PMLamw8-lkp@intel.com/

All warnings (new ones prefixed by >>):

>> drivers/bus/mhi/host/pm.c:1246:6: warning: no previous prototype for function 'mhi_power_down_unprepare_keep_dev' [-Wmissing-prototypes]
    1246 | void mhi_power_down_unprepare_keep_dev(struct mhi_controller *mhi_cntrl)
         |      ^
   drivers/bus/mhi/host/pm.c:1246:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
    1246 | void mhi_power_down_unprepare_keep_dev(struct mhi_controller *mhi_cntrl)
         | ^
         | static 
   1 warning generated.


vim +/mhi_power_down_unprepare_keep_dev +1246 drivers/bus/mhi/host/pm.c

  1245	
> 1246	void mhi_power_down_unprepare_keep_dev(struct mhi_controller *mhi_cntrl)
  1247	{
  1248		mhi_cntrl->bhi = NULL;
  1249		mhi_cntrl->bhie = NULL;
  1250	
  1251		mhi_deinit_dev_ctxt(mhi_cntrl);
  1252	}
  1253	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3] bus: mhi: host: don't free bhie tables during suspend/hibernation
  2025-04-29 12:20 Muhammad Usama Anjum
@ 2025-05-01 16:00 ` Greg Kroah-Hartman
  2025-05-02  4:15   ` Muhammad Usama Anjum
  0 siblings, 1 reply; 8+ messages in thread
From: Greg Kroah-Hartman @ 2025-05-01 16:00 UTC (permalink / raw)
  To: Muhammad Usama Anjum
  Cc: Manivannan Sadhasivam, Jeff Johnson, Jeff Hugo, Youssef Samir,
	Matthew Leung, Yan Zhen, Alex Elder, Jacek Lawrynowicz,
	Kunwu Chan, Troy Hanson, Dr. David Alan Gilbert, kernel, mhi,
	linux-arm-msm, linux-kernel, linux-wireless, ath11k, ath12k

On Tue, Apr 29, 2025 at 05:20:56PM +0500, Muhammad Usama Anjum wrote:
> Fix dma_direct_alloc() failure at resume time during bhie_table
> allocation. There is a crash report where at resume time, the memory
> from the dma doesn't get allocated and MHI fails to re-initialize.
> There is fragmentation/memory pressure.
> 
> To fix it, don't free the memory at power down during suspend /
> hibernation. Instead, use the same allocated memory again after every
> resume / hibernation. This patch has been tested with resume and
> hibernation both.
> 
> The rddm is of constant size for a given hardware. While the fbc_image
> size depends on the firmware. If the firmware changes, we'll free and
> allocate new memory for it.
> 
> Here are the crash logs:
> 
> [ 3029.338587] mhi mhi0: Requested to power ON
> [ 3029.338621] mhi mhi0: Power on setup success
> [ 3029.668654] kworker/u33:8: page allocation failure: order:7, mode:0xc04(GFP_NOIO|GFP_DMA32), nodemask=(null),cpuset=/,mems_allowed=0
> [ 3029.668682] CPU: 4 UID: 0 PID: 2744 Comm: kworker/u33:8 Not tainted 6.11.11-valve10-1-neptune-611-gb69e902b4338 #1ed779c892334112fb968aaa3facf9686b5ff0bd7
> [ 3029.668690] Hardware name: Valve Galileo/Galileo, BIOS F7G0112 08/01/2024
> [ 3029.668694] Workqueue: mhi_hiprio_wq mhi_pm_st_worker [mhi]
> [ 3029.668717] Call Trace:
> [ 3029.668722]  <TASK>
> [ 3029.668728]  dump_stack_lvl+0x4e/0x70
> [ 3029.668738]  warn_alloc+0x164/0x190
> [ 3029.668747]  ? srso_return_thunk+0x5/0x5f
> [ 3029.668754]  ? __alloc_pages_direct_compact+0xaf/0x360
> [ 3029.668761]  __alloc_pages_slowpath.constprop.0+0xc75/0xd70
> [ 3029.668774]  __alloc_pages_noprof+0x321/0x350
> [ 3029.668782]  __dma_direct_alloc_pages.isra.0+0x14a/0x290
> [ 3029.668790]  dma_direct_alloc+0x70/0x270
> [ 3029.668796]  mhi_alloc_bhie_table+0xe8/0x190 [mhi faa917c5aa23a5f5b12d6a2c597067e16d2fedc0]
> [ 3029.668814]  mhi_fw_load_handler+0x1bc/0x310 [mhi faa917c5aa23a5f5b12d6a2c597067e16d2fedc0]
> [ 3029.668830]  mhi_pm_st_worker+0x5c8/0xaa0 [mhi faa917c5aa23a5f5b12d6a2c597067e16d2fedc0]
> [ 3029.668844]  ? srso_return_thunk+0x5/0x5f
> [ 3029.668853]  process_one_work+0x17e/0x330
> [ 3029.668861]  worker_thread+0x2ce/0x3f0
> [ 3029.668868]  ? __pfx_worker_thread+0x10/0x10
> [ 3029.668873]  kthread+0xd2/0x100
> [ 3029.668879]  ? __pfx_kthread+0x10/0x10
> [ 3029.668885]  ret_from_fork+0x34/0x50
> [ 3029.668892]  ? __pfx_kthread+0x10/0x10
> [ 3029.668898]  ret_from_fork_asm+0x1a/0x30
> [ 3029.668910]  </TASK>
> 
> Tested-on: WCN6855 WLAN.HSP.1.1-03926.13-QCAHSPSWPL_V2_SILICONZ_CE-2.52297.6
> 
> Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>

What commit id does this fix?  Should it go to stable kernel(s)?  If so,
how far back?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3] bus: mhi: host: don't free bhie tables during suspend/hibernation
  2025-05-01 16:00 ` Greg Kroah-Hartman
@ 2025-05-02  4:15   ` Muhammad Usama Anjum
  2025-05-02  5:06     ` Greg Kroah-Hartman
  0 siblings, 1 reply; 8+ messages in thread
From: Muhammad Usama Anjum @ 2025-05-02  4:15 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: usama.anjum, Manivannan Sadhasivam, Jeff Johnson, Jeff Hugo,
	Youssef Samir, Matthew Leung, Yan Zhen, Alex Elder,
	Jacek Lawrynowicz, Kunwu Chan, Troy Hanson,
	Dr. David Alan Gilbert, kernel, mhi, linux-arm-msm, linux-kernel,
	linux-wireless, ath11k, ath12k

Hi Greg,

On 5/1/25 9:00 PM, Greg Kroah-Hartman wrote:
> On Tue, Apr 29, 2025 at 05:20:56PM +0500, Muhammad Usama Anjum wrote:
>> Fix dma_direct_alloc() failure at resume time during bhie_table
>> allocation. There is a crash report where at resume time, the memory
>> from the dma doesn't get allocated and MHI fails to re-initialize.
>> There is fragmentation/memory pressure.
>>
>> To fix it, don't free the memory at power down during suspend /
>> hibernation. Instead, use the same allocated memory again after every
>> resume / hibernation. This patch has been tested with resume and
>> hibernation both.
>>
>> The rddm is of constant size for a given hardware. While the fbc_image
>> size depends on the firmware. If the firmware changes, we'll free and
>> allocate new memory for it.
>>
>> Here are the crash logs:
>>
>> [ 3029.338587] mhi mhi0: Requested to power ON
>> [ 3029.338621] mhi mhi0: Power on setup success
>> [ 3029.668654] kworker/u33:8: page allocation failure: order:7, mode:0xc04(GFP_NOIO|GFP_DMA32), nodemask=(null),cpuset=/,mems_allowed=0
>> [ 3029.668682] CPU: 4 UID: 0 PID: 2744 Comm: kworker/u33:8 Not tainted 6.11.11-valve10-1-neptune-611-gb69e902b4338 #1ed779c892334112fb968aaa3facf9686b5ff0bd7
>> [ 3029.668690] Hardware name: Valve Galileo/Galileo, BIOS F7G0112 08/01/2024
>> [ 3029.668694] Workqueue: mhi_hiprio_wq mhi_pm_st_worker [mhi]
>> [ 3029.668717] Call Trace:
>> [ 3029.668722]  <TASK>
>> [ 3029.668728]  dump_stack_lvl+0x4e/0x70
>> [ 3029.668738]  warn_alloc+0x164/0x190
>> [ 3029.668747]  ? srso_return_thunk+0x5/0x5f
>> [ 3029.668754]  ? __alloc_pages_direct_compact+0xaf/0x360
>> [ 3029.668761]  __alloc_pages_slowpath.constprop.0+0xc75/0xd70
>> [ 3029.668774]  __alloc_pages_noprof+0x321/0x350
>> [ 3029.668782]  __dma_direct_alloc_pages.isra.0+0x14a/0x290
>> [ 3029.668790]  dma_direct_alloc+0x70/0x270
>> [ 3029.668796]  mhi_alloc_bhie_table+0xe8/0x190 [mhi faa917c5aa23a5f5b12d6a2c597067e16d2fedc0]
>> [ 3029.668814]  mhi_fw_load_handler+0x1bc/0x310 [mhi faa917c5aa23a5f5b12d6a2c597067e16d2fedc0]
>> [ 3029.668830]  mhi_pm_st_worker+0x5c8/0xaa0 [mhi faa917c5aa23a5f5b12d6a2c597067e16d2fedc0]
>> [ 3029.668844]  ? srso_return_thunk+0x5/0x5f
>> [ 3029.668853]  process_one_work+0x17e/0x330
>> [ 3029.668861]  worker_thread+0x2ce/0x3f0
>> [ 3029.668868]  ? __pfx_worker_thread+0x10/0x10
>> [ 3029.668873]  kthread+0xd2/0x100
>> [ 3029.668879]  ? __pfx_kthread+0x10/0x10
>> [ 3029.668885]  ret_from_fork+0x34/0x50
>> [ 3029.668892]  ? __pfx_kthread+0x10/0x10
>> [ 3029.668898]  ret_from_fork_asm+0x1a/0x30
>> [ 3029.668910]  </TASK>
>>
>> Tested-on: WCN6855 WLAN.HSP.1.1-03926.13-QCAHSPSWPL_V2_SILICONZ_CE-2.52297.6
>>
>> Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
> 
> What commit id does this fix?  Should it go to stable kernel(s)?  If so,
> how far back?
This patch is fixing the dma_coherent_alloc() failure when there is
memory pressure and its unable to allocate memory. Its not a bug in
allocation API or the driver. I think it should be considered an
improvement instead of the fix. Please correct me if I'm wrong.

-- 
Regards,
Usama

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3] bus: mhi: host: don't free bhie tables during suspend/hibernation
  2025-05-02  4:15   ` Muhammad Usama Anjum
@ 2025-05-02  5:06     ` Greg Kroah-Hartman
  0 siblings, 0 replies; 8+ messages in thread
From: Greg Kroah-Hartman @ 2025-05-02  5:06 UTC (permalink / raw)
  To: Muhammad Usama Anjum
  Cc: Manivannan Sadhasivam, Jeff Johnson, Jeff Hugo, Youssef Samir,
	Matthew Leung, Yan Zhen, Alex Elder, Jacek Lawrynowicz,
	Kunwu Chan, Troy Hanson, Dr. David Alan Gilbert, kernel, mhi,
	linux-arm-msm, linux-kernel, linux-wireless, ath11k, ath12k

On Fri, May 02, 2025 at 09:15:10AM +0500, Muhammad Usama Anjum wrote:
> Hi Greg,
> 
> On 5/1/25 9:00 PM, Greg Kroah-Hartman wrote:
> > On Tue, Apr 29, 2025 at 05:20:56PM +0500, Muhammad Usama Anjum wrote:
> >> Fix dma_direct_alloc() failure at resume time during bhie_table
> >> allocation. There is a crash report where at resume time, the memory
> >> from the dma doesn't get allocated and MHI fails to re-initialize.
> >> There is fragmentation/memory pressure.
> >>
> >> To fix it, don't free the memory at power down during suspend /
> >> hibernation. Instead, use the same allocated memory again after every
> >> resume / hibernation. This patch has been tested with resume and
> >> hibernation both.
> >>
> >> The rddm is of constant size for a given hardware. While the fbc_image
> >> size depends on the firmware. If the firmware changes, we'll free and
> >> allocate new memory for it.
> >>
> >> Here are the crash logs:
> >>
> >> [ 3029.338587] mhi mhi0: Requested to power ON
> >> [ 3029.338621] mhi mhi0: Power on setup success
> >> [ 3029.668654] kworker/u33:8: page allocation failure: order:7, mode:0xc04(GFP_NOIO|GFP_DMA32), nodemask=(null),cpuset=/,mems_allowed=0
> >> [ 3029.668682] CPU: 4 UID: 0 PID: 2744 Comm: kworker/u33:8 Not tainted 6.11.11-valve10-1-neptune-611-gb69e902b4338 #1ed779c892334112fb968aaa3facf9686b5ff0bd7
> >> [ 3029.668690] Hardware name: Valve Galileo/Galileo, BIOS F7G0112 08/01/2024
> >> [ 3029.668694] Workqueue: mhi_hiprio_wq mhi_pm_st_worker [mhi]
> >> [ 3029.668717] Call Trace:
> >> [ 3029.668722]  <TASK>
> >> [ 3029.668728]  dump_stack_lvl+0x4e/0x70
> >> [ 3029.668738]  warn_alloc+0x164/0x190
> >> [ 3029.668747]  ? srso_return_thunk+0x5/0x5f
> >> [ 3029.668754]  ? __alloc_pages_direct_compact+0xaf/0x360
> >> [ 3029.668761]  __alloc_pages_slowpath.constprop.0+0xc75/0xd70
> >> [ 3029.668774]  __alloc_pages_noprof+0x321/0x350
> >> [ 3029.668782]  __dma_direct_alloc_pages.isra.0+0x14a/0x290
> >> [ 3029.668790]  dma_direct_alloc+0x70/0x270
> >> [ 3029.668796]  mhi_alloc_bhie_table+0xe8/0x190 [mhi faa917c5aa23a5f5b12d6a2c597067e16d2fedc0]
> >> [ 3029.668814]  mhi_fw_load_handler+0x1bc/0x310 [mhi faa917c5aa23a5f5b12d6a2c597067e16d2fedc0]
> >> [ 3029.668830]  mhi_pm_st_worker+0x5c8/0xaa0 [mhi faa917c5aa23a5f5b12d6a2c597067e16d2fedc0]
> >> [ 3029.668844]  ? srso_return_thunk+0x5/0x5f
> >> [ 3029.668853]  process_one_work+0x17e/0x330
> >> [ 3029.668861]  worker_thread+0x2ce/0x3f0
> >> [ 3029.668868]  ? __pfx_worker_thread+0x10/0x10
> >> [ 3029.668873]  kthread+0xd2/0x100
> >> [ 3029.668879]  ? __pfx_kthread+0x10/0x10
> >> [ 3029.668885]  ret_from_fork+0x34/0x50
> >> [ 3029.668892]  ? __pfx_kthread+0x10/0x10
> >> [ 3029.668898]  ret_from_fork_asm+0x1a/0x30
> >> [ 3029.668910]  </TASK>
> >>
> >> Tested-on: WCN6855 WLAN.HSP.1.1-03926.13-QCAHSPSWPL_V2_SILICONZ_CE-2.52297.6
> >>
> >> Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
> > 
> > What commit id does this fix?  Should it go to stable kernel(s)?  If so,
> > how far back?
> This patch is fixing the dma_coherent_alloc() failure when there is
> memory pressure and its unable to allocate memory. Its not a bug in
> allocation API or the driver. I think it should be considered an
> improvement instead of the fix. Please correct me if I'm wrong.

You show a kernel crash in the changelog, that's a major issue (i.e.
will get assigned a CVE), so you need to show what commit id it fixes
for people to know how far back to take the fix to.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2025-05-02  5:06 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-29 12:23 [PATCH v3] bus: mhi: host: don't free bhie tables during suspend/hibernation Muhammad Usama Anjum
2025-04-29 20:55 ` Sebastian Reichel
2025-04-30 14:29 ` kernel test robot
2025-04-30 16:45 ` kernel test robot
  -- strict thread matches above, loose matches on Subject: below --
2025-04-29 12:20 Muhammad Usama Anjum
2025-05-01 16:00 ` Greg Kroah-Hartman
2025-05-02  4:15   ` Muhammad Usama Anjum
2025-05-02  5:06     ` Greg Kroah-Hartman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).