linux-arm-msm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] bus: mhi: keep dma buffers through suspend/hibernation cycles
@ 2025-06-30  7:43 Muhammad Usama Anjum
  2025-06-30  7:43 ` [PATCH 1/3] bus: mhi: host: keep bhi buffer through suspend cycle Muhammad Usama Anjum
                   ` (2 more replies)
  0 siblings, 3 replies; 27+ messages in thread
From: Muhammad Usama Anjum @ 2025-06-30  7:43 UTC (permalink / raw)
  To: Manivannan Sadhasivam, Jeff Johnson, Jeff Hugo, Youssef Samir,
	Matthew Leung, Muhammad Usama Anjum, Yan Zhen, Alexander Wilhelm,
	Alex Elder, Kunwu Chan, Greg Kroah-Hartman, Siddartha Mohanadoss,
	Sujeev Dias, Julia Lawall, John Crispin, Muna Sinada,
	Venkateswara Naralasetty, Maharaja Kennadyrajan, mhi,
	linux-arm-msm, linux-kernel, linux-wireless, ath11k
  Cc: kernel

When there is memory pressure during resume and no DMA memory is
available, the ath11k driver fails to resume. The driver currently
frees its DMA memory during suspend or hibernate, and attempts to
re-allocate it during resume. However, if the DMA memory has been
consumed by other software in the meantime, these allocations can
fail, leading to critical failures in the WiFi driver. It has been
reported [1].

Although I have recently fixed several instances [2] [3] to ensure
DMA memory is not freed once allocated, we continue to receive
reports of new failures.

In this series, 3 more such cases are being fixed. There are still
some cases which I'm trying to fix. They can be discussed separately.

[1] https://lore.kernel.org/all/ead32f5b-730a-4b81-b38f-93d822f990c6@collabora.com
[2] https://lore.kernel.org/all/20250428080242.466901-1-usama.anjum@collabora.com
[3] https://lore.kernel.org/all/20250516184952.878726-1-usama.anjum@collabora.com

Muhammad Usama Anjum (3):
  bus: mhi: host: keep bhi buffer through suspend cycle
  bus: mhi: don't deinitialize and re-initialize again
  bus: mhi: keep device context through suspend cycles

 drivers/bus/mhi/host/boot.c            | 19 ++++++++++---------
 drivers/bus/mhi/host/init.c            | 16 +++++++++++-----
 drivers/bus/mhi/host/internal.h        |  2 ++
 drivers/net/wireless/ath/ath11k/core.c |  5 -----
 include/linux/mhi.h                    |  1 +
 5 files changed, 24 insertions(+), 19 deletions(-)

-- 
2.39.5


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 1/3] bus: mhi: host: keep bhi buffer through suspend cycle
  2025-06-30  7:43 [PATCH 0/3] bus: mhi: keep dma buffers through suspend/hibernation cycles Muhammad Usama Anjum
@ 2025-06-30  7:43 ` Muhammad Usama Anjum
  2025-07-01 10:25   ` Greg Kroah-Hartman
  2025-07-08  9:47   ` Krishna Chaitanya Chundru
  2025-06-30  7:43 ` [PATCH 2/3] bus: mhi: don't deinitialize and re-initialize again Muhammad Usama Anjum
  2025-06-30  7:43 ` [PATCH 3/3] bus: mhi: keep device context through suspend cycles Muhammad Usama Anjum
  2 siblings, 2 replies; 27+ messages in thread
From: Muhammad Usama Anjum @ 2025-06-30  7:43 UTC (permalink / raw)
  To: Manivannan Sadhasivam, Jeff Johnson, Jeff Hugo, Youssef Samir,
	Matthew Leung, Muhammad Usama Anjum, Yan Zhen, Alexander Wilhelm,
	Alex Elder, Kunwu Chan, Greg Kroah-Hartman, Siddartha Mohanadoss,
	Sujeev Dias, Julia Lawall, John Crispin, Muna Sinada,
	Venkateswara Naralasetty, Maharaja Kennadyrajan, mhi,
	linux-arm-msm, linux-kernel, linux-wireless, ath11k
  Cc: kernel

When there is memory pressure, at resume time dma_alloc_coherent()
returns error which in turn fails the loading of firmware and hence
the driver crashes:

kernel: kworker/u33:5: page allocation failure: order:7,
mode:0xc04(GFP_NOIO|GFP_DMA32), nodemask=(null),cpuset=/,mems_allowed=0
kernel: CPU: 1 UID: 0 PID: 7693 Comm: kworker/u33:5 Not tainted
6.11.11-valve17-1-neptune-611-g027868a0ac03 #1
3843143b92e9da0fa2d3d5f21f51beaed15c7d59
kernel: Hardware name: Valve Galileo/Galileo, BIOS F7G0112 08/01/2024
kernel: Workqueue: mhi_hiprio_wq mhi_pm_st_worker [mhi]
kernel: Call Trace:
kernel:  <TASK>
kernel:  dump_stack_lvl+0x4e/0x70
kernel:  warn_alloc+0x164/0x190
kernel:  ? srso_return_thunk+0x5/0x5f
kernel:  ? __alloc_pages_direct_compact+0xaf/0x360
kernel:  __alloc_pages_slowpath.constprop.0+0xc75/0xd70
kernel:  __alloc_pages_noprof+0x321/0x350
kernel:  __dma_direct_alloc_pages.isra.0+0x14a/0x290
kernel:  dma_direct_alloc+0x70/0x270
kernel:  mhi_fw_load_handler+0x126/0x340 [mhi
a96cb91daba500cc77f86bad60c1f332dc3babdf]
kernel:  mhi_pm_st_worker+0x5e8/0xac0 [mhi
a96cb91daba500cc77f86bad60c1f332dc3babdf]
kernel:  ? srso_return_thunk+0x5/0x5f
kernel:  process_one_work+0x17e/0x330
kernel:  worker_thread+0x2ce/0x3f0
kernel:  ? __pfx_worker_thread+0x10/0x10
kernel:  kthread+0xd2/0x100
kernel:  ? __pfx_kthread+0x10/0x10
kernel:  ret_from_fork+0x34/0x50
kernel:  ? __pfx_kthread+0x10/0x10
kernel:  ret_from_fork_asm+0x1a/0x30
kernel:  </TASK>
kernel: Mem-Info:
kernel: active_anon:513809 inactive_anon:152 isolated_anon:0
    active_file:359315 inactive_file:2487001 isolated_file:0
    unevictable:637 dirty:19 writeback:0
    slab_reclaimable:160391 slab_unreclaimable:39729
    mapped:175836 shmem:51039 pagetables:4415
    sec_pagetables:0 bounce:0
    kernel_misc_reclaimable:0
    free:125666 free_pcp:0 free_cma:0

In above example, if we sum all the consumed memory, it comes out
to be 15.5GB and free memory is ~ 500MB from a total of 16GB RAM.
Even though memory is present. But all of the dma memory has been
exhausted or fragmented.

Fix it by allocating it only once and then reuse the same allocated
memory. As we'll allocate this memory only once, this memory will stay
allocated.

Tested-on: WCN6855 WLAN.HSP.1.1-03926.13-QCAHSPSWPL_V2_SILICONZ_CE-2.52297.6

Fixes: cd457afb1667 ("bus: mhi: core: Add support for downloading firmware over BHIe")
Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
---
Reported here:
https://lore.kernel.org/all/ead32f5b-730a-4b81-b38f-93d822f990c6@collabora.com

Still a lot of more fixes are required. Hence, I'm not adding closes tag.
---
 drivers/bus/mhi/host/boot.c     | 19 ++++++++++---------
 drivers/bus/mhi/host/init.c     |  5 +++++
 drivers/bus/mhi/host/internal.h |  2 ++
 include/linux/mhi.h             |  1 +
 4 files changed, 18 insertions(+), 9 deletions(-)

diff --git a/drivers/bus/mhi/host/boot.c b/drivers/bus/mhi/host/boot.c
index b3a85aa3c4768..11bb8c12ac597 100644
--- a/drivers/bus/mhi/host/boot.c
+++ b/drivers/bus/mhi/host/boot.c
@@ -302,8 +302,8 @@ static int mhi_fw_load_bhi(struct mhi_controller *mhi_cntrl,
 	return -EIO;
 }
 
-static void mhi_free_bhi_buffer(struct mhi_controller *mhi_cntrl,
-				struct image_info *image_info)
+void mhi_free_bhi_buffer(struct mhi_controller *mhi_cntrl,
+			 struct image_info *image_info)
 {
 	struct mhi_buf *mhi_buf = image_info->mhi_buf;
 
@@ -455,18 +455,19 @@ static enum mhi_fw_load_type mhi_fw_load_type_get(const struct mhi_controller *m
 
 static int mhi_load_image_bhi(struct mhi_controller *mhi_cntrl, const u8 *fw_data, size_t size)
 {
-	struct image_info *image;
+	struct image_info *image = mhi_cntrl->bhi_image;
 	int ret;
 
-	ret = mhi_alloc_bhi_buffer(mhi_cntrl, &image, size);
-	if (ret)
-		return ret;
+	if (!image) {
+		ret = mhi_alloc_bhi_buffer(mhi_cntrl, &image, size);
+		if (ret)
+			return ret;
 
-	/* Load the firmware into BHI vec table */
-	memcpy(image->mhi_buf->buf, fw_data, size);
+		/* Load the firmware into BHI vec table */
+		memcpy(image->mhi_buf->buf, fw_data, size);
+	}
 
 	ret = mhi_fw_load_bhi(mhi_cntrl, &image->mhi_buf[image->entries - 1]);
-	mhi_free_bhi_buffer(mhi_cntrl, image);
 
 	return ret;
 }
diff --git a/drivers/bus/mhi/host/init.c b/drivers/bus/mhi/host/init.c
index 6e06e4efec765..2e0f18c939e68 100644
--- a/drivers/bus/mhi/host/init.c
+++ b/drivers/bus/mhi/host/init.c
@@ -1228,6 +1228,11 @@ void mhi_unprepare_after_power_down(struct mhi_controller *mhi_cntrl)
 		mhi_cntrl->rddm_image = NULL;
 	}
 
+	if (mhi_cntrl->bhi_image) {
+		mhi_free_bhi_buffer(mhi_cntrl, mhi_cntrl->bhi_image);
+		mhi_cntrl->bhi_image = NULL;
+	}
+
 	mhi_deinit_dev_ctxt(mhi_cntrl);
 }
 EXPORT_SYMBOL_GPL(mhi_unprepare_after_power_down);
diff --git a/drivers/bus/mhi/host/internal.h b/drivers/bus/mhi/host/internal.h
index 1054e67bb450d..60b0699323375 100644
--- a/drivers/bus/mhi/host/internal.h
+++ b/drivers/bus/mhi/host/internal.h
@@ -324,6 +324,8 @@ int mhi_alloc_bhie_table(struct mhi_controller *mhi_cntrl,
 			 struct image_info **image_info, size_t alloc_size);
 void mhi_free_bhie_table(struct mhi_controller *mhi_cntrl,
 			 struct image_info *image_info);
+void mhi_free_bhi_buffer(struct mhi_controller *mhi_cntrl,
+			 struct image_info *image_info);
 
 /* Power management APIs */
 enum mhi_pm_state __must_check mhi_tryset_pm_state(
diff --git a/include/linux/mhi.h b/include/linux/mhi.h
index 4c567907933a5..593012f779d97 100644
--- a/include/linux/mhi.h
+++ b/include/linux/mhi.h
@@ -391,6 +391,7 @@ struct mhi_controller {
 	size_t reg_len;
 	struct image_info *fbc_image;
 	struct image_info *rddm_image;
+	struct image_info *bhi_image;
 	struct mhi_chan *mhi_chan;
 	struct list_head lpm_chans;
 	int *irq;
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 2/3] bus: mhi: don't deinitialize and re-initialize again
  2025-06-30  7:43 [PATCH 0/3] bus: mhi: keep dma buffers through suspend/hibernation cycles Muhammad Usama Anjum
  2025-06-30  7:43 ` [PATCH 1/3] bus: mhi: host: keep bhi buffer through suspend cycle Muhammad Usama Anjum
@ 2025-06-30  7:43 ` Muhammad Usama Anjum
  2025-07-01 10:25   ` Greg Kroah-Hartman
                     ` (2 more replies)
  2025-06-30  7:43 ` [PATCH 3/3] bus: mhi: keep device context through suspend cycles Muhammad Usama Anjum
  2 siblings, 3 replies; 27+ messages in thread
From: Muhammad Usama Anjum @ 2025-06-30  7:43 UTC (permalink / raw)
  To: Manivannan Sadhasivam, Jeff Johnson, Jeff Hugo, Youssef Samir,
	Matthew Leung, Muhammad Usama Anjum, Yan Zhen, Alexander Wilhelm,
	Alex Elder, Kunwu Chan, Greg Kroah-Hartman, Siddartha Mohanadoss,
	Sujeev Dias, Julia Lawall, John Crispin, Muna Sinada,
	Venkateswara Naralasetty, Maharaja Kennadyrajan, mhi,
	linux-arm-msm, linux-kernel, linux-wireless, ath11k
  Cc: kernel

Don't deinitialize and reinitialize the HAL helpers. The dma memory is
deallocated and there is high possibility that we'll not be able to get
the same memory allocated from dma when there is high memory pressure.

Tested-on: WCN6855 WLAN.HSP.1.1-03926.13-QCAHSPSWPL_V2_SILICONZ_CE-2.52297.6

Fixes: d5c65159f289 ("ath11k: driver for Qualcomm IEEE 802.11ax devices")
Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
---
 drivers/net/wireless/ath/ath11k/core.c | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/drivers/net/wireless/ath/ath11k/core.c b/drivers/net/wireless/ath/ath11k/core.c
index 4488e4cdc5e9e..bc4930fe6a367 100644
--- a/drivers/net/wireless/ath/ath11k/core.c
+++ b/drivers/net/wireless/ath/ath11k/core.c
@@ -2213,14 +2213,9 @@ static int ath11k_core_reconfigure_on_crash(struct ath11k_base *ab)
 	mutex_unlock(&ab->core_lock);
 
 	ath11k_dp_free(ab);
-	ath11k_hal_srng_deinit(ab);
 
 	ab->free_vdev_map = (1LL << (ab->num_radios * TARGET_NUM_VDEVS(ab))) - 1;
 
-	ret = ath11k_hal_srng_init(ab);
-	if (ret)
-		return ret;
-
 	clear_bit(ATH11K_FLAG_CRASH_FLUSH, &ab->dev_flags);
 
 	ret = ath11k_core_qmi_firmware_ready(ab);
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 3/3] bus: mhi: keep device context through suspend cycles
  2025-06-30  7:43 [PATCH 0/3] bus: mhi: keep dma buffers through suspend/hibernation cycles Muhammad Usama Anjum
  2025-06-30  7:43 ` [PATCH 1/3] bus: mhi: host: keep bhi buffer through suspend cycle Muhammad Usama Anjum
  2025-06-30  7:43 ` [PATCH 2/3] bus: mhi: don't deinitialize and re-initialize again Muhammad Usama Anjum
@ 2025-06-30  7:43 ` Muhammad Usama Anjum
  2025-07-01 10:26   ` Greg Kroah-Hartman
  2025-07-08 10:15   ` Krishna Chaitanya Chundru
  2 siblings, 2 replies; 27+ messages in thread
From: Muhammad Usama Anjum @ 2025-06-30  7:43 UTC (permalink / raw)
  To: Manivannan Sadhasivam, Jeff Johnson, Jeff Hugo, Youssef Samir,
	Matthew Leung, Muhammad Usama Anjum, Yan Zhen, Alexander Wilhelm,
	Alex Elder, Kunwu Chan, Greg Kroah-Hartman, Siddartha Mohanadoss,
	Sujeev Dias, Julia Lawall, John Crispin, Muna Sinada,
	Venkateswara Naralasetty, Maharaja Kennadyrajan, mhi,
	linux-arm-msm, linux-kernel, linux-wireless, ath11k
  Cc: kernel

Don't deinitialize the device context while going into suspend or
hibernation cycles. Otherwise the resume may fail if at resume time, the
memory pressure is high and no dma memory is available.

Tested-on: WCN6855 WLAN.HSP.1.1-03926.13-QCAHSPSWPL_V2_SILICONZ_CE-2.52297.6

Fixes: 3000f85b8f47 ("bus: mhi: core: Add support for basic PM operations")
Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
---
 drivers/bus/mhi/host/init.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/bus/mhi/host/init.c b/drivers/bus/mhi/host/init.c
index 2e0f18c939e68..8f56e73fdc42e 100644
--- a/drivers/bus/mhi/host/init.c
+++ b/drivers/bus/mhi/host/init.c
@@ -1133,9 +1133,11 @@ int mhi_prepare_for_power_up(struct mhi_controller *mhi_cntrl)
 
 	mutex_lock(&mhi_cntrl->pm_mutex);
 
-	ret = mhi_init_dev_ctxt(mhi_cntrl);
-	if (ret)
-		goto error_dev_ctxt;
+	if (!mhi_cntrl->mhi_ctxt) {
+		ret = mhi_init_dev_ctxt(mhi_cntrl);
+		if (ret)
+			goto error_dev_ctxt;
+	}
 
 	ret = mhi_read_reg(mhi_cntrl, mhi_cntrl->regs, BHIOFF, &bhi_off);
 	if (ret) {
@@ -1212,8 +1214,6 @@ void mhi_deinit_dev_ctxt(struct mhi_controller *mhi_cntrl)
 {
 	mhi_cntrl->bhi = NULL;
 	mhi_cntrl->bhie = NULL;
-
-	__mhi_deinit_dev_ctxt(mhi_cntrl);
 }
 
 void mhi_unprepare_after_power_down(struct mhi_controller *mhi_cntrl)
@@ -1234,6 +1234,7 @@ void mhi_unprepare_after_power_down(struct mhi_controller *mhi_cntrl)
 	}
 
 	mhi_deinit_dev_ctxt(mhi_cntrl);
+	__mhi_deinit_dev_ctxt(mhi_cntrl);
 }
 EXPORT_SYMBOL_GPL(mhi_unprepare_after_power_down);
 
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/3] bus: mhi: host: keep bhi buffer through suspend cycle
  2025-06-30  7:43 ` [PATCH 1/3] bus: mhi: host: keep bhi buffer through suspend cycle Muhammad Usama Anjum
@ 2025-07-01 10:25   ` Greg Kroah-Hartman
  2025-07-02 15:24     ` Muhammad Usama Anjum
  2025-07-08  9:47   ` Krishna Chaitanya Chundru
  1 sibling, 1 reply; 27+ messages in thread
From: Greg Kroah-Hartman @ 2025-07-01 10:25 UTC (permalink / raw)
  To: Muhammad Usama Anjum
  Cc: Manivannan Sadhasivam, Jeff Johnson, Jeff Hugo, Youssef Samir,
	Matthew Leung, Yan Zhen, Alexander Wilhelm, Alex Elder,
	Kunwu Chan, Siddartha Mohanadoss, Sujeev Dias, Julia Lawall,
	John Crispin, Muna Sinada, Venkateswara Naralasetty,
	Maharaja Kennadyrajan, mhi, linux-arm-msm, linux-kernel,
	linux-wireless, ath11k, kernel

On Mon, Jun 30, 2025 at 12:43:28PM +0500, Muhammad Usama Anjum wrote:
> When there is memory pressure, at resume time dma_alloc_coherent()
> returns error which in turn fails the loading of firmware and hence
> the driver crashes:
> 
> kernel: kworker/u33:5: page allocation failure: order:7,
> mode:0xc04(GFP_NOIO|GFP_DMA32), nodemask=(null),cpuset=/,mems_allowed=0
> kernel: CPU: 1 UID: 0 PID: 7693 Comm: kworker/u33:5 Not tainted
> 6.11.11-valve17-1-neptune-611-g027868a0ac03 #1
> 3843143b92e9da0fa2d3d5f21f51beaed15c7d59

Please don't wrap kernel log lines.

> kernel: Hardware name: Valve Galileo/Galileo, BIOS F7G0112 08/01/2024
> kernel: Workqueue: mhi_hiprio_wq mhi_pm_st_worker [mhi]
> kernel: Call Trace:
> kernel:  <TASK>
> kernel:  dump_stack_lvl+0x4e/0x70
> kernel:  warn_alloc+0x164/0x190
> kernel:  ? srso_return_thunk+0x5/0x5f
> kernel:  ? __alloc_pages_direct_compact+0xaf/0x360
> kernel:  __alloc_pages_slowpath.constprop.0+0xc75/0xd70
> kernel:  __alloc_pages_noprof+0x321/0x350
> kernel:  __dma_direct_alloc_pages.isra.0+0x14a/0x290
> kernel:  dma_direct_alloc+0x70/0x270
> kernel:  mhi_fw_load_handler+0x126/0x340 [mhi
> a96cb91daba500cc77f86bad60c1f332dc3babdf]
> kernel:  mhi_pm_st_worker+0x5e8/0xac0 [mhi
> a96cb91daba500cc77f86bad60c1f332dc3babdf]

Same here.

> kernel:  ? srso_return_thunk+0x5/0x5f
> kernel:  process_one_work+0x17e/0x330
> kernel:  worker_thread+0x2ce/0x3f0
> kernel:  ? __pfx_worker_thread+0x10/0x10
> kernel:  kthread+0xd2/0x100
> kernel:  ? __pfx_kthread+0x10/0x10
> kernel:  ret_from_fork+0x34/0x50
> kernel:  ? __pfx_kthread+0x10/0x10
> kernel:  ret_from_fork_asm+0x1a/0x30
> kernel:  </TASK>
> kernel: Mem-Info:
> kernel: active_anon:513809 inactive_anon:152 isolated_anon:0
>     active_file:359315 inactive_file:2487001 isolated_file:0
>     unevictable:637 dirty:19 writeback:0
>     slab_reclaimable:160391 slab_unreclaimable:39729
>     mapped:175836 shmem:51039 pagetables:4415
>     sec_pagetables:0 bounce:0
>     kernel_misc_reclaimable:0
>     free:125666 free_pcp:0 free_cma:0
> 
> In above example, if we sum all the consumed memory, it comes out
> to be 15.5GB and free memory is ~ 500MB from a total of 16GB RAM.
> Even though memory is present. But all of the dma memory has been
> exhausted or fragmented.
> 
> Fix it by allocating it only once and then reuse the same allocated
> memory. As we'll allocate this memory only once, this memory will stay
> allocated.
> 
> Tested-on: WCN6855 WLAN.HSP.1.1-03926.13-QCAHSPSWPL_V2_SILICONZ_CE-2.52297.6
> 
> Fixes: cd457afb1667 ("bus: mhi: core: Add support for downloading firmware over BHIe")

No cc: stable?

> Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
> ---
> Reported here:
> https://lore.kernel.org/all/ead32f5b-730a-4b81-b38f-93d822f990c6@collabora.com
> 
> Still a lot of more fixes are required. Hence, I'm not adding closes tag.
> ---
>  drivers/bus/mhi/host/boot.c     | 19 ++++++++++---------
>  drivers/bus/mhi/host/init.c     |  5 +++++
>  drivers/bus/mhi/host/internal.h |  2 ++
>  include/linux/mhi.h             |  1 +
>  4 files changed, 18 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/bus/mhi/host/boot.c b/drivers/bus/mhi/host/boot.c
> index b3a85aa3c4768..11bb8c12ac597 100644
> --- a/drivers/bus/mhi/host/boot.c
> +++ b/drivers/bus/mhi/host/boot.c
> @@ -302,8 +302,8 @@ static int mhi_fw_load_bhi(struct mhi_controller *mhi_cntrl,
>  	return -EIO;
>  }
>  
> -static void mhi_free_bhi_buffer(struct mhi_controller *mhi_cntrl,
> -				struct image_info *image_info)
> +void mhi_free_bhi_buffer(struct mhi_controller *mhi_cntrl,
> +			 struct image_info *image_info)
>  {
>  	struct mhi_buf *mhi_buf = image_info->mhi_buf;
>  
> @@ -455,18 +455,19 @@ static enum mhi_fw_load_type mhi_fw_load_type_get(const struct mhi_controller *m
>  
>  static int mhi_load_image_bhi(struct mhi_controller *mhi_cntrl, const u8 *fw_data, size_t size)
>  {
> -	struct image_info *image;
> +	struct image_info *image = mhi_cntrl->bhi_image;
>  	int ret;
>  
> -	ret = mhi_alloc_bhi_buffer(mhi_cntrl, &image, size);
> -	if (ret)
> -		return ret;
> +	if (!image) {

What prevents image from going away right after you tested it?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/3] bus: mhi: don't deinitialize and re-initialize again
  2025-06-30  7:43 ` [PATCH 2/3] bus: mhi: don't deinitialize and re-initialize again Muhammad Usama Anjum
@ 2025-07-01 10:25   ` Greg Kroah-Hartman
  2025-07-02 15:25     ` Muhammad Usama Anjum
  2025-07-01 14:49   ` Jeff Johnson
  2025-07-02  3:50   ` Baochen Qiang
  2 siblings, 1 reply; 27+ messages in thread
From: Greg Kroah-Hartman @ 2025-07-01 10:25 UTC (permalink / raw)
  To: Muhammad Usama Anjum
  Cc: Manivannan Sadhasivam, Jeff Johnson, Jeff Hugo, Youssef Samir,
	Matthew Leung, Yan Zhen, Alexander Wilhelm, Alex Elder,
	Kunwu Chan, Siddartha Mohanadoss, Sujeev Dias, Julia Lawall,
	John Crispin, Muna Sinada, Venkateswara Naralasetty,
	Maharaja Kennadyrajan, mhi, linux-arm-msm, linux-kernel,
	linux-wireless, ath11k, kernel

On Mon, Jun 30, 2025 at 12:43:29PM +0500, Muhammad Usama Anjum wrote:
> Don't deinitialize and reinitialize the HAL helpers. The dma memory is
> deallocated and there is high possibility that we'll not be able to get
> the same memory allocated from dma when there is high memory pressure.
> 
> Tested-on: WCN6855 WLAN.HSP.1.1-03926.13-QCAHSPSWPL_V2_SILICONZ_CE-2.52297.6
> 
> Fixes: d5c65159f289 ("ath11k: driver for Qualcomm IEEE 802.11ax devices")

No cc: stable?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/3] bus: mhi: keep device context through suspend cycles
  2025-06-30  7:43 ` [PATCH 3/3] bus: mhi: keep device context through suspend cycles Muhammad Usama Anjum
@ 2025-07-01 10:26   ` Greg Kroah-Hartman
  2025-07-08 10:15   ` Krishna Chaitanya Chundru
  1 sibling, 0 replies; 27+ messages in thread
From: Greg Kroah-Hartman @ 2025-07-01 10:26 UTC (permalink / raw)
  To: Muhammad Usama Anjum
  Cc: Manivannan Sadhasivam, Jeff Johnson, Jeff Hugo, Youssef Samir,
	Matthew Leung, Yan Zhen, Alexander Wilhelm, Alex Elder,
	Kunwu Chan, Siddartha Mohanadoss, Sujeev Dias, Julia Lawall,
	John Crispin, Muna Sinada, Venkateswara Naralasetty,
	Maharaja Kennadyrajan, mhi, linux-arm-msm, linux-kernel,
	linux-wireless, ath11k, kernel

On Mon, Jun 30, 2025 at 12:43:30PM +0500, Muhammad Usama Anjum wrote:
> Don't deinitialize the device context while going into suspend or
> hibernation cycles. Otherwise the resume may fail if at resume time, the
> memory pressure is high and no dma memory is available.
> 
> Tested-on: WCN6855 WLAN.HSP.1.1-03926.13-QCAHSPSWPL_V2_SILICONZ_CE-2.52297.6
> 
> Fixes: 3000f85b8f47 ("bus: mhi: core: Add support for basic PM operations")

No cc: stable?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/3] bus: mhi: don't deinitialize and re-initialize again
  2025-06-30  7:43 ` [PATCH 2/3] bus: mhi: don't deinitialize and re-initialize again Muhammad Usama Anjum
  2025-07-01 10:25   ` Greg Kroah-Hartman
@ 2025-07-01 14:49   ` Jeff Johnson
  2025-07-02 15:28     ` Muhammad Usama Anjum
  2025-07-02  3:50   ` Baochen Qiang
  2 siblings, 1 reply; 27+ messages in thread
From: Jeff Johnson @ 2025-07-01 14:49 UTC (permalink / raw)
  To: Muhammad Usama Anjum, Manivannan Sadhasivam, Jeff Johnson,
	Jeff Hugo, Youssef Samir, Matthew Leung, Yan Zhen,
	Alexander Wilhelm, Alex Elder, Kunwu Chan, Greg Kroah-Hartman,
	Siddartha Mohanadoss, Sujeev Dias, Julia Lawall, John Crispin,
	Muna Sinada, Venkateswara Naralasetty, Maharaja Kennadyrajan, mhi,
	linux-arm-msm, linux-kernel, linux-wireless, ath11k
  Cc: kernel

On 6/30/2025 12:43 AM, Muhammad Usama Anjum wrote:

Subject has incorrect prefix, should be "wifi: ath11k: "

And ideally it should mention HAL SRNG since it is specific to that allocation

> Don't deinitialize and reinitialize the HAL helpers. The dma memory is
> deallocated and there is high possibility that we'll not be able to get
> the same memory allocated from dma when there is high memory pressure.
> 
> Tested-on: WCN6855 WLAN.HSP.1.1-03926.13-QCAHSPSWPL_V2_SILICONZ_CE-2.52297.6

not quite the right format since it is missing hw version and bus

> 
> Fixes: d5c65159f289 ("ath11k: driver for Qualcomm IEEE 802.11ax devices")
> Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
> ---
>  drivers/net/wireless/ath/ath11k/core.c | 5 -----
>  1 file changed, 5 deletions(-)
> 
> diff --git a/drivers/net/wireless/ath/ath11k/core.c b/drivers/net/wireless/ath/ath11k/core.c
> index 4488e4cdc5e9e..bc4930fe6a367 100644
> --- a/drivers/net/wireless/ath/ath11k/core.c
> +++ b/drivers/net/wireless/ath/ath11k/core.c
> @@ -2213,14 +2213,9 @@ static int ath11k_core_reconfigure_on_crash(struct ath11k_base *ab)
>  	mutex_unlock(&ab->core_lock);
>  
>  	ath11k_dp_free(ab);
> -	ath11k_hal_srng_deinit(ab);
>  
>  	ab->free_vdev_map = (1LL << (ab->num_radios * TARGET_NUM_VDEVS(ab))) - 1;
>  
> -	ret = ath11k_hal_srng_init(ab);
> -	if (ret)
> -		return ret;
> -
>  	clear_bit(ATH11K_FLAG_CRASH_FLUSH, &ab->dev_flags);
>  
>  	ret = ath11k_core_qmi_firmware_ready(ab);

does this patch have any dependency upon the 1/3 patch?
if not, then it should be sent separately since it should go through the ath
tree instead of through the mhi tree.

/jeff

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/3] bus: mhi: don't deinitialize and re-initialize again
  2025-06-30  7:43 ` [PATCH 2/3] bus: mhi: don't deinitialize and re-initialize again Muhammad Usama Anjum
  2025-07-01 10:25   ` Greg Kroah-Hartman
  2025-07-01 14:49   ` Jeff Johnson
@ 2025-07-02  3:50   ` Baochen Qiang
  2025-07-02 16:12     ` Muhammad Usama Anjum
  2 siblings, 1 reply; 27+ messages in thread
From: Baochen Qiang @ 2025-07-02  3:50 UTC (permalink / raw)
  To: Muhammad Usama Anjum, Manivannan Sadhasivam, Jeff Johnson,
	Jeff Hugo, Youssef Samir, Matthew Leung, Yan Zhen,
	Alexander Wilhelm, Alex Elder, Kunwu Chan, Greg Kroah-Hartman,
	Siddartha Mohanadoss, Sujeev Dias, Julia Lawall, John Crispin,
	Muna Sinada, Venkateswara Naralasetty, Maharaja Kennadyrajan, mhi,
	linux-arm-msm, linux-kernel, linux-wireless, ath11k
  Cc: kernel



On 6/30/2025 3:43 PM, Muhammad Usama Anjum wrote:
> Don't deinitialize and reinitialize the HAL helpers. The dma memory is
> deallocated and there is high possibility that we'll not be able to get
> the same memory allocated from dma when there is high memory pressure.
> 
> Tested-on: WCN6855 WLAN.HSP.1.1-03926.13-QCAHSPSWPL_V2_SILICONZ_CE-2.52297.6
> 
> Fixes: d5c65159f289 ("ath11k: driver for Qualcomm IEEE 802.11ax devices")
> Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
> ---
>  drivers/net/wireless/ath/ath11k/core.c | 5 -----
>  1 file changed, 5 deletions(-)
> 
> diff --git a/drivers/net/wireless/ath/ath11k/core.c b/drivers/net/wireless/ath/ath11k/core.c
> index 4488e4cdc5e9e..bc4930fe6a367 100644
> --- a/drivers/net/wireless/ath/ath11k/core.c
> +++ b/drivers/net/wireless/ath/ath11k/core.c
> @@ -2213,14 +2213,9 @@ static int ath11k_core_reconfigure_on_crash(struct ath11k_base *ab)
>  	mutex_unlock(&ab->core_lock);
>  
>  	ath11k_dp_free(ab);
> -	ath11k_hal_srng_deinit(ab);
>  
>  	ab->free_vdev_map = (1LL << (ab->num_radios * TARGET_NUM_VDEVS(ab))) - 1;
>  
> -	ret = ath11k_hal_srng_init(ab);
> -	if (ret)
> -		return ret;
> -

while I agree there is no need of a dealloc/realloc, we can not simply remove calling the
_deinit()/_init() pair. At least the memset() cleanup to hal parameters (e.g.
avail_blk_resource, current_blk_index and num_shadow_reg_configured etc.) inside the
_init() needs to be kept as the later operation needs a clean state of them.

>  	clear_bit(ATH11K_FLAG_CRASH_FLUSH, &ab->dev_flags);
>  
>  	ret = ath11k_core_qmi_firmware_ready(ab);

the _deinit() is still getting called in case ath11k_core_qmi_firmware_ready() fails,
making it a little odd since there is no _init() anymore with this change, though this is
the way of current logic (I mean the hal is currently deinit in the error path).

Thinking it more, if we hit the error path, seems the only way is to remove ath11k module.
In that case the _deinit() would be called again in ath11k_pci_remove(), leading to issues
(at least I see a double free of hal->srng_config). But this is another topic which can be
fixed in a separate patch.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/3] bus: mhi: host: keep bhi buffer through suspend cycle
  2025-07-01 10:25   ` Greg Kroah-Hartman
@ 2025-07-02 15:24     ` Muhammad Usama Anjum
  0 siblings, 0 replies; 27+ messages in thread
From: Muhammad Usama Anjum @ 2025-07-02 15:24 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Manivannan Sadhasivam, Jeff Johnson, Jeff Hugo, Youssef Samir,
	Matthew Leung, Yan Zhen, Alexander Wilhelm, Alex Elder,
	Kunwu Chan, Siddartha Mohanadoss, Sujeev Dias, Julia Lawall,
	John Crispin, Muna Sinada, Venkateswara Naralasetty,
	Maharaja Kennadyrajan, mhi, linux-arm-msm, linux-kernel,
	linux-wireless, ath11k, kernel

Thanks for reviewing!

On 7/1/25 3:25 PM, Greg Kroah-Hartman wrote:
> On Mon, Jun 30, 2025 at 12:43:28PM +0500, Muhammad Usama Anjum wrote:
>> When there is memory pressure, at resume time dma_alloc_coherent()
>> returns error which in turn fails the loading of firmware and hence
>> the driver crashes:
>>
>> kernel: kworker/u33:5: page allocation failure: order:7,
>> mode:0xc04(GFP_NOIO|GFP_DMA32), nodemask=(null),cpuset=/,mems_allowed=0
>> kernel: CPU: 1 UID: 0 PID: 7693 Comm: kworker/u33:5 Not tainted
>> 6.11.11-valve17-1-neptune-611-g027868a0ac03 #1
>> 3843143b92e9da0fa2d3d5f21f51beaed15c7d59
> 
> Please don't wrap kernel log lines.
Sorry, missed fixing it. I'll fix it.

> 
>> kernel: Hardware name: Valve Galileo/Galileo, BIOS F7G0112 08/01/2024
>> kernel: Workqueue: mhi_hiprio_wq mhi_pm_st_worker [mhi]
>> kernel: Call Trace:
>> kernel:  <TASK>
>> kernel:  dump_stack_lvl+0x4e/0x70
>> kernel:  warn_alloc+0x164/0x190
>> kernel:  ? srso_return_thunk+0x5/0x5f
>> kernel:  ? __alloc_pages_direct_compact+0xaf/0x360
>> kernel:  __alloc_pages_slowpath.constprop.0+0xc75/0xd70
>> kernel:  __alloc_pages_noprof+0x321/0x350
>> kernel:  __dma_direct_alloc_pages.isra.0+0x14a/0x290
>> kernel:  dma_direct_alloc+0x70/0x270
>> kernel:  mhi_fw_load_handler+0x126/0x340 [mhi
>> a96cb91daba500cc77f86bad60c1f332dc3babdf]
>> kernel:  mhi_pm_st_worker+0x5e8/0xac0 [mhi
>> a96cb91daba500cc77f86bad60c1f332dc3babdf]
> 
> Same here.
I'll fix it.

> 
>> kernel:  ? srso_return_thunk+0x5/0x5f
>> kernel:  process_one_work+0x17e/0x330
>> kernel:  worker_thread+0x2ce/0x3f0
>> kernel:  ? __pfx_worker_thread+0x10/0x10
>> kernel:  kthread+0xd2/0x100
>> kernel:  ? __pfx_kthread+0x10/0x10
>> kernel:  ret_from_fork+0x34/0x50
>> kernel:  ? __pfx_kthread+0x10/0x10
>> kernel:  ret_from_fork_asm+0x1a/0x30
>> kernel:  </TASK>
>> kernel: Mem-Info:
>> kernel: active_anon:513809 inactive_anon:152 isolated_anon:0
>>     active_file:359315 inactive_file:2487001 isolated_file:0
>>     unevictable:637 dirty:19 writeback:0
>>     slab_reclaimable:160391 slab_unreclaimable:39729
>>     mapped:175836 shmem:51039 pagetables:4415
>>     sec_pagetables:0 bounce:0
>>     kernel_misc_reclaimable:0
>>     free:125666 free_pcp:0 free_cma:0
>>
>> In above example, if we sum all the consumed memory, it comes out
>> to be 15.5GB and free memory is ~ 500MB from a total of 16GB RAM.
>> Even though memory is present. But all of the dma memory has been
>> exhausted or fragmented.
>>
>> Fix it by allocating it only once and then reuse the same allocated
>> memory. As we'll allocate this memory only once, this memory will stay
>> allocated.
>>
>> Tested-on: WCN6855 WLAN.HSP.1.1-03926.13-QCAHSPSWPL_V2_SILICONZ_CE-2.52297.6
>>
>> Fixes: cd457afb1667 ("bus: mhi: core: Add support for downloading firmware over BHIe")
> 
> No cc: stable?
I'll add stable in v2.

> 
>> Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
>> ---
>> Reported here:
>> https://lore.kernel.org/all/ead32f5b-730a-4b81-b38f-93d822f990c6@collabora.com
>>
>> Still a lot of more fixes are required. Hence, I'm not adding closes tag.
>> ---
>>  drivers/bus/mhi/host/boot.c     | 19 ++++++++++---------
>>  drivers/bus/mhi/host/init.c     |  5 +++++
>>  drivers/bus/mhi/host/internal.h |  2 ++
>>  include/linux/mhi.h             |  1 +
>>  4 files changed, 18 insertions(+), 9 deletions(-)
>>
>> diff --git a/drivers/bus/mhi/host/boot.c b/drivers/bus/mhi/host/boot.c
>> index b3a85aa3c4768..11bb8c12ac597 100644
>> --- a/drivers/bus/mhi/host/boot.c
>> +++ b/drivers/bus/mhi/host/boot.c
>> @@ -302,8 +302,8 @@ static int mhi_fw_load_bhi(struct mhi_controller *mhi_cntrl,
>>  	return -EIO;
>>  }
>>  
>> -static void mhi_free_bhi_buffer(struct mhi_controller *mhi_cntrl,
>> -				struct image_info *image_info)
>> +void mhi_free_bhi_buffer(struct mhi_controller *mhi_cntrl,
>> +			 struct image_info *image_info)
>>  {
>>  	struct mhi_buf *mhi_buf = image_info->mhi_buf;
>>  
>> @@ -455,18 +455,19 @@ static enum mhi_fw_load_type mhi_fw_load_type_get(const struct mhi_controller *m
>>  
>>  static int mhi_load_image_bhi(struct mhi_controller *mhi_cntrl, const u8 *fw_data, size_t size)
>>  {
>> -	struct image_info *image;
>> +	struct image_info *image = mhi_cntrl->bhi_image;
>>  	int ret;
>>  
>> -	ret = mhi_alloc_bhi_buffer(mhi_cntrl, &image, size);
>> -	if (ret)
>> -		return ret;
>> +	if (!image) {
> 
> What prevents image from going away right after you tested it?
The changed code isn't doing what it should be doing. I'll completely fix in v2. 

> 
> thanks,
> 
> greg k-h


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/3] bus: mhi: don't deinitialize and re-initialize again
  2025-07-01 10:25   ` Greg Kroah-Hartman
@ 2025-07-02 15:25     ` Muhammad Usama Anjum
  0 siblings, 0 replies; 27+ messages in thread
From: Muhammad Usama Anjum @ 2025-07-02 15:25 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Manivannan Sadhasivam, Jeff Johnson, Jeff Hugo, Youssef Samir,
	Matthew Leung, Yan Zhen, Alexander Wilhelm, Alex Elder,
	Kunwu Chan, Siddartha Mohanadoss, Sujeev Dias, Julia Lawall,
	John Crispin, Muna Sinada, Venkateswara Naralasetty,
	Maharaja Kennadyrajan, mhi, linux-arm-msm, linux-kernel,
	linux-wireless, ath11k, kernel

On 7/1/25 3:25 PM, Greg Kroah-Hartman wrote:
> On Mon, Jun 30, 2025 at 12:43:29PM +0500, Muhammad Usama Anjum wrote:
>> Don't deinitialize and reinitialize the HAL helpers. The dma memory is
>> deallocated and there is high possibility that we'll not be able to get
>> the same memory allocated from dma when there is high memory pressure.
>>
>> Tested-on: WCN6855 WLAN.HSP.1.1-03926.13-QCAHSPSWPL_V2_SILICONZ_CE-2.52297.6
>>
>> Fixes: d5c65159f289 ("ath11k: driver for Qualcomm IEEE 802.11ax devices")
> 
> No cc: stable?
I'll add it to entire series.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/3] bus: mhi: don't deinitialize and re-initialize again
  2025-07-01 14:49   ` Jeff Johnson
@ 2025-07-02 15:28     ` Muhammad Usama Anjum
  2025-07-02 17:25       ` Jeff Johnson
  0 siblings, 1 reply; 27+ messages in thread
From: Muhammad Usama Anjum @ 2025-07-02 15:28 UTC (permalink / raw)
  To: Jeff Johnson, Manivannan Sadhasivam, Jeff Johnson, Jeff Hugo,
	Youssef Samir, Matthew Leung, Yan Zhen, Alexander Wilhelm,
	Alex Elder, Kunwu Chan, Greg Kroah-Hartman, Siddartha Mohanadoss,
	Sujeev Dias, Julia Lawall, John Crispin, Muna Sinada,
	Venkateswara Naralasetty, Maharaja Kennadyrajan, mhi,
	linux-arm-msm, linux-kernel, linux-wireless, ath11k
  Cc: kernel

Thank you for reviewing!

On 7/1/25 7:49 PM, Jeff Johnson wrote:
> On 6/30/2025 12:43 AM, Muhammad Usama Anjum wrote:
> 
> Subject has incorrect prefix, should be "wifi: ath11k: "
> 
> And ideally it should mention HAL SRNG since it is specific to that allocation
I'll fix the subject.

> 
>> Don't deinitialize and reinitialize the HAL helpers. The dma memory is
>> deallocated and there is high possibility that we'll not be able to get
>> the same memory allocated from dma when there is high memory pressure.
>>
>> Tested-on: WCN6855 WLAN.HSP.1.1-03926.13-QCAHSPSWPL_V2_SILICONZ_CE-2.52297.6
> 
> not quite the right format since it is missing hw version and bus
I've been using the same tag from last accepted patches. How to construct the
correct patch?

> 
>>
>> Fixes: d5c65159f289 ("ath11k: driver for Qualcomm IEEE 802.11ax devices")
>> Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
>> ---
>>  drivers/net/wireless/ath/ath11k/core.c | 5 -----
>>  1 file changed, 5 deletions(-)
>>
>> diff --git a/drivers/net/wireless/ath/ath11k/core.c b/drivers/net/wireless/ath/ath11k/core.c
>> index 4488e4cdc5e9e..bc4930fe6a367 100644
>> --- a/drivers/net/wireless/ath/ath11k/core.c
>> +++ b/drivers/net/wireless/ath/ath11k/core.c
>> @@ -2213,14 +2213,9 @@ static int ath11k_core_reconfigure_on_crash(struct ath11k_base *ab)
>>  	mutex_unlock(&ab->core_lock);
>>  
>>  	ath11k_dp_free(ab);
>> -	ath11k_hal_srng_deinit(ab);
>>  
>>  	ab->free_vdev_map = (1LL << (ab->num_radios * TARGET_NUM_VDEVS(ab))) - 1;
>>  
>> -	ret = ath11k_hal_srng_init(ab);
>> -	if (ret)
>> -		return ret;
>> -
>>  	clear_bit(ATH11K_FLAG_CRASH_FLUSH, &ab->dev_flags);
>>  
>>  	ret = ath11k_core_qmi_firmware_ready(ab);
> 
> does this patch have any dependency upon the 1/3 patch?
No

> if not, then it should be sent separately since it should go through the ath
> tree instead of through the mhi tree.
Ohh... I see. I'll send it separately.

> 
> /jeff


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/3] bus: mhi: don't deinitialize and re-initialize again
  2025-07-02  3:50   ` Baochen Qiang
@ 2025-07-02 16:12     ` Muhammad Usama Anjum
  2025-07-03  1:59       ` Baochen Qiang
  0 siblings, 1 reply; 27+ messages in thread
From: Muhammad Usama Anjum @ 2025-07-02 16:12 UTC (permalink / raw)
  To: Baochen Qiang, Manivannan Sadhasivam, Jeff Johnson, Jeff Hugo,
	Youssef Samir, Matthew Leung, Yan Zhen, Alexander Wilhelm,
	Alex Elder, Kunwu Chan, Greg Kroah-Hartman, Siddartha Mohanadoss,
	Sujeev Dias, Julia Lawall, John Crispin, Muna Sinada,
	Venkateswara Naralasetty, Maharaja Kennadyrajan, mhi,
	linux-arm-msm, linux-kernel, linux-wireless, ath11k
  Cc: kernel

Thank you for reviewing.

On 7/2/25 8:50 AM, Baochen Qiang wrote:
> 
> 
> On 6/30/2025 3:43 PM, Muhammad Usama Anjum wrote:
>> Don't deinitialize and reinitialize the HAL helpers. The dma memory is
>> deallocated and there is high possibility that we'll not be able to get
>> the same memory allocated from dma when there is high memory pressure.
>>
>> Tested-on: WCN6855 WLAN.HSP.1.1-03926.13-QCAHSPSWPL_V2_SILICONZ_CE-2.52297.6
>>
>> Fixes: d5c65159f289 ("ath11k: driver for Qualcomm IEEE 802.11ax devices")
>> Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
>> ---
>>  drivers/net/wireless/ath/ath11k/core.c | 5 -----
>>  1 file changed, 5 deletions(-)
>>
>> diff --git a/drivers/net/wireless/ath/ath11k/core.c b/drivers/net/wireless/ath/ath11k/core.c
>> index 4488e4cdc5e9e..bc4930fe6a367 100644
>> --- a/drivers/net/wireless/ath/ath11k/core.c
>> +++ b/drivers/net/wireless/ath/ath11k/core.c
>> @@ -2213,14 +2213,9 @@ static int ath11k_core_reconfigure_on_crash(struct ath11k_base *ab)
>>  	mutex_unlock(&ab->core_lock);
>>  
>>  	ath11k_dp_free(ab);
>> -	ath11k_hal_srng_deinit(ab);
>>  
>>  	ab->free_vdev_map = (1LL << (ab->num_radios * TARGET_NUM_VDEVS(ab))) - 1;
>>  
>> -	ret = ath11k_hal_srng_init(ab);
>> -	if (ret)
>> -		return ret;
>> -
> 
> while I agree there is no need of a dealloc/realloc, we can not simply remove calling the
> _deinit()/_init() pair. At least the memset() cleanup to hal parameters (e.g.
Why do is it being done in the resume handler? Shouldn't those parameters be cleaned up
in resume handler? So when device wakes up, its state is already correct.

I'm not sure why it worked every time when I tested it on my device.

> avail_blk_resource, current_blk_index and num_shadow_reg_configured etc.) inside the
> _init() needs to be kept as the later operation needs a clean state of them.
So should we just memset these 3?


> 
>>  	clear_bit(ATH11K_FLAG_CRASH_FLUSH, &ab->dev_flags);
>>  
>>  	ret = ath11k_core_qmi_firmware_ready(ab);
> 
> the _deinit() is still getting called in case ath11k_core_qmi_firmware_ready() fails,
> making it a little odd since there is no _init() anymore with this change, though this is
> the way of current logic (I mean the hal is currently deinit in the error path).
> 
> Thinking it more, if we hit the error path, seems the only way is to remove ath11k module.
> In that case the _deinit() would be called again in ath11k_pci_remove(), leading to issues
> (at least I see a double free of hal->srng_config). But this is another topic which can be
> fixed in a separate patch.

I don't think this is the problem as HAL is already initialized when before the system has
suspended. So by removing deinit() and init() pairs, the HAL still remains initialized. Or
maybe I've missed something?

 


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/3] bus: mhi: don't deinitialize and re-initialize again
  2025-07-02 15:28     ` Muhammad Usama Anjum
@ 2025-07-02 17:25       ` Jeff Johnson
  0 siblings, 0 replies; 27+ messages in thread
From: Jeff Johnson @ 2025-07-02 17:25 UTC (permalink / raw)
  To: Muhammad Usama Anjum, Manivannan Sadhasivam, Jeff Johnson,
	Jeff Hugo, Youssef Samir, Matthew Leung, Yan Zhen,
	Alexander Wilhelm, Alex Elder, Kunwu Chan, Greg Kroah-Hartman,
	Siddartha Mohanadoss, Sujeev Dias, Julia Lawall, John Crispin,
	Muna Sinada, Venkateswara Naralasetty, Maharaja Kennadyrajan, mhi,
	linux-arm-msm, linux-kernel, linux-wireless, ath11k
  Cc: kernel

On 7/2/2025 8:28 AM, Muhammad Usama Anjum wrote:
> On 7/1/25 7:49 PM, Jeff Johnson wrote:
>> On 6/30/2025 12:43 AM, Muhammad Usama Anjum wrote:
>>> the same memory allocated from dma when there is high memory pressure.
>>>
>>> Tested-on: WCN6855 WLAN.HSP.1.1-03926.13-QCAHSPSWPL_V2_SILICONZ_CE-2.52297.6
>>
>> not quite the right format since it is missing hw version and bus
> I've been using the same tag from last accepted patches. How to construct the
> correct patch?

https://wireless.docs.kernel.org/en/latest/en/users/drivers/ath12k/submittingpatches.html#tested-on-tag


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/3] bus: mhi: don't deinitialize and re-initialize again
  2025-07-02 16:12     ` Muhammad Usama Anjum
@ 2025-07-03  1:59       ` Baochen Qiang
  2025-07-07  8:19         ` Muhammad Usama Anjum
  0 siblings, 1 reply; 27+ messages in thread
From: Baochen Qiang @ 2025-07-03  1:59 UTC (permalink / raw)
  To: Muhammad Usama Anjum, Baochen Qiang, Manivannan Sadhasivam,
	Jeff Johnson, Jeff Hugo, Youssef Samir, Matthew Leung, Yan Zhen,
	Alexander Wilhelm, Alex Elder, Kunwu Chan, Greg Kroah-Hartman,
	Siddartha Mohanadoss, Sujeev Dias, Julia Lawall, John Crispin,
	Muna Sinada, Venkateswara Naralasetty, Maharaja Kennadyrajan, mhi,
	linux-arm-msm, linux-kernel, linux-wireless, ath11k
  Cc: kernel



On 7/3/2025 12:12 AM, Muhammad Usama Anjum wrote:
> Thank you for reviewing.
> 
> On 7/2/25 8:50 AM, Baochen Qiang wrote:
>>
>>
>> On 6/30/2025 3:43 PM, Muhammad Usama Anjum wrote:
>>> Don't deinitialize and reinitialize the HAL helpers. The dma memory is
>>> deallocated and there is high possibility that we'll not be able to get
>>> the same memory allocated from dma when there is high memory pressure.
>>>
>>> Tested-on: WCN6855 WLAN.HSP.1.1-03926.13-QCAHSPSWPL_V2_SILICONZ_CE-2.52297.6
>>>
>>> Fixes: d5c65159f289 ("ath11k: driver for Qualcomm IEEE 802.11ax devices")
>>> Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
>>> ---
>>>  drivers/net/wireless/ath/ath11k/core.c | 5 -----
>>>  1 file changed, 5 deletions(-)
>>>
>>> diff --git a/drivers/net/wireless/ath/ath11k/core.c b/drivers/net/wireless/ath/ath11k/core.c
>>> index 4488e4cdc5e9e..bc4930fe6a367 100644
>>> --- a/drivers/net/wireless/ath/ath11k/core.c
>>> +++ b/drivers/net/wireless/ath/ath11k/core.c
>>> @@ -2213,14 +2213,9 @@ static int ath11k_core_reconfigure_on_crash(struct ath11k_base *ab)
>>>  	mutex_unlock(&ab->core_lock);
>>>  
>>>  	ath11k_dp_free(ab);
>>> -	ath11k_hal_srng_deinit(ab);
>>>  
>>>  	ab->free_vdev_map = (1LL << (ab->num_radios * TARGET_NUM_VDEVS(ab))) - 1;
>>>  
>>> -	ret = ath11k_hal_srng_init(ab);
>>> -	if (ret)
>>> -		return ret;
>>> -
>>
>> while I agree there is no need of a dealloc/realloc, we can not simply remove calling the
>> _deinit()/_init() pair. At least the memset() cleanup to hal parameters (e.g.
> Why do is it being done in the resume handler? Shouldn't those parameters be cleaned up
> in resume handler? So when device wakes up, its state is already correct.
> 

Hmm... not quite understand your question. Can you elaborate?

> I'm not sure why it worked every time when I tested it on my device.
> 
>> avail_blk_resource, current_blk_index and num_shadow_reg_configured etc.) inside the
>> _init() needs to be kept as the later operation needs a clean state of them.
> So should we just memset these 3?

more than them I think. We need to take care of all entries in hal since current code is
memset them all.

> 
> 
>>
>>>  	clear_bit(ATH11K_FLAG_CRASH_FLUSH, &ab->dev_flags);
>>>  
>>>  	ret = ath11k_core_qmi_firmware_ready(ab);
>>
>> the _deinit() is still getting called in case ath11k_core_qmi_firmware_ready() fails,
>> making it a little odd since there is no _init() anymore with this change, though this is
>> the way of current logic (I mean the hal is currently deinit in the error path).
>>
>> Thinking it more, if we hit the error path, seems the only way is to remove ath11k module.
>> In that case the _deinit() would be called again in ath11k_pci_remove(), leading to issues
>> (at least I see a double free of hal->srng_config). But this is another topic which can be
>> fixed in a separate patch.
> 
> I don't think this is the problem as HAL is already initialized when before the system has
> suspended. So by removing deinit() and init() pairs, the HAL still remains initialized. Or
> maybe I've missed something?

Yeah, it is OK in normal path. However in error path we face issues.

> 
>  
> 
> 


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/3] bus: mhi: don't deinitialize and re-initialize again
  2025-07-03  1:59       ` Baochen Qiang
@ 2025-07-07  8:19         ` Muhammad Usama Anjum
  2025-07-07  9:00           ` Baochen Qiang
  0 siblings, 1 reply; 27+ messages in thread
From: Muhammad Usama Anjum @ 2025-07-07  8:19 UTC (permalink / raw)
  To: Baochen Qiang, Manivannan Sadhasivam, Jeff Johnson, Jeff Hugo,
	Youssef Samir, Matthew Leung, Yan Zhen, Alexander Wilhelm,
	Alex Elder, Kunwu Chan, Greg Kroah-Hartman, Siddartha Mohanadoss,
	Sujeev Dias, Julia Lawall, John Crispin, Muna Sinada,
	Venkateswara Naralasetty, Maharaja Kennadyrajan, mhi,
	linux-arm-msm, linux-kernel, linux-wireless, ath11k
  Cc: kernel

On 7/3/25 6:59 AM, Baochen Qiang wrote:
> 
> 
> On 7/3/2025 12:12 AM, Muhammad Usama Anjum wrote:
>> Thank you for reviewing.
>>
>> On 7/2/25 8:50 AM, Baochen Qiang wrote:
>>>
>>>
>>> On 6/30/2025 3:43 PM, Muhammad Usama Anjum wrote:
>>>> Don't deinitialize and reinitialize the HAL helpers. The dma memory is
>>>> deallocated and there is high possibility that we'll not be able to get
>>>> the same memory allocated from dma when there is high memory pressure.
>>>>
>>>> Tested-on: WCN6855 WLAN.HSP.1.1-03926.13-QCAHSPSWPL_V2_SILICONZ_CE-2.52297.6
>>>>
>>>> Fixes: d5c65159f289 ("ath11k: driver for Qualcomm IEEE 802.11ax devices")
>>>> Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
>>>> ---
>>>>  drivers/net/wireless/ath/ath11k/core.c | 5 -----
>>>>  1 file changed, 5 deletions(-)
>>>>
>>>> diff --git a/drivers/net/wireless/ath/ath11k/core.c b/drivers/net/wireless/ath/ath11k/core.c
>>>> index 4488e4cdc5e9e..bc4930fe6a367 100644
>>>> --- a/drivers/net/wireless/ath/ath11k/core.c
>>>> +++ b/drivers/net/wireless/ath/ath11k/core.c
>>>> @@ -2213,14 +2213,9 @@ static int ath11k_core_reconfigure_on_crash(struct ath11k_base *ab)
>>>>  	mutex_unlock(&ab->core_lock);
>>>>  
>>>>  	ath11k_dp_free(ab);
>>>> -	ath11k_hal_srng_deinit(ab);
>>>>  
>>>>  	ab->free_vdev_map = (1LL << (ab->num_radios * TARGET_NUM_VDEVS(ab))) - 1;
>>>>  
>>>> -	ret = ath11k_hal_srng_init(ab);
>>>> -	if (ret)
>>>> -		return ret;
>>>> -
>>>
>>> while I agree there is no need of a dealloc/realloc, we can not simply remove calling the
>>> _deinit()/_init() pair. At least the memset() cleanup to hal parameters (e.g.
>> Why do is it being done in the resume handler? Shouldn't those parameters be cleaned up
>> in resume handler? So when device wakes up, its state is already correct.
>>
> 
> Hmm... not quite understand your question. Can you elaborate?

I'm trying to understand the possibility of cleanup of hal in suspend handler. For example:
* The driver has been loaded and has been working fine.
* The user called suspend. So all devices would be suspended.
* In suspend handler of the ath11k, we should do the necessary cleanups of the states
  like hal.
* When the device would resume after long time, the hal would have the correct state
  already. So we'll not need to deinit and init again.

> 
>> I'm not sure why it worked every time when I tested it on my device.
>>
>>> avail_blk_resource, current_blk_index and num_shadow_reg_configured etc.) inside the
>>> _init() needs to be kept as the later operation needs a clean state of them.
>> So should we just memset these 3?
> 
> more than them I think. We need to take care of all entries in hal since current code is
> memset them all.
I see. So memset the whole ath11k hal structure other than the config.

> 
>>
>>
>>>
>>>>  	clear_bit(ATH11K_FLAG_CRASH_FLUSH, &ab->dev_flags);
>>>>  
>>>>  	ret = ath11k_core_qmi_firmware_ready(ab);
>>>
>>> the _deinit() is still getting called in case ath11k_core_qmi_firmware_ready() fails,
>>> making it a little odd since there is no _init() anymore with this change, though this is
>>> the way of current logic (I mean the hal is currently deinit in the error path).
>>>
>>> Thinking it more, if we hit the error path, seems the only way is to remove ath11k module.
>>> In that case the _deinit() would be called again in ath11k_pci_remove(), leading to issues
>>> (at least I see a double free of hal->srng_config). But this is another topic which can be
>>> fixed in a separate patch.
>>
>> I don't think this is the problem as HAL is already initialized when before the system has
>> suspended. So by removing deinit() and init() pairs, the HAL still remains initialized. Or
>> maybe I've missed something?
> 
> Yeah, it is OK in normal path. However in error path we face issues.
For example:
* When driver was initialized the first time, the hal was init.
* Then system was suspended and hal doesn't get deinit.
* At system resume, the hal is already init. We can memset some status variables. But its
  initialized already from the first time. (considering this patch that deinit/init have
  been removed)
* So at this stage if some error occurs and we can call the deinit in the error paths.



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/3] bus: mhi: don't deinitialize and re-initialize again
  2025-07-07  8:19         ` Muhammad Usama Anjum
@ 2025-07-07  9:00           ` Baochen Qiang
  2025-07-07 13:11             ` Muhammad Usama Anjum
  0 siblings, 1 reply; 27+ messages in thread
From: Baochen Qiang @ 2025-07-07  9:00 UTC (permalink / raw)
  To: Muhammad Usama Anjum, Manivannan Sadhasivam, Jeff Johnson,
	Jeff Hugo, Youssef Samir, Matthew Leung, Yan Zhen,
	Alexander Wilhelm, Alex Elder, Kunwu Chan, Greg Kroah-Hartman,
	Siddartha Mohanadoss, Sujeev Dias, Julia Lawall, John Crispin,
	Muna Sinada, Venkateswara Naralasetty, Maharaja Kennadyrajan, mhi,
	linux-arm-msm, linux-kernel, linux-wireless, ath11k
  Cc: kernel



On 7/7/2025 4:19 PM, Muhammad Usama Anjum wrote:
> On 7/3/25 6:59 AM, Baochen Qiang wrote:
>>
>>
>> On 7/3/2025 12:12 AM, Muhammad Usama Anjum wrote:
>>> Thank you for reviewing.
>>>
>>> On 7/2/25 8:50 AM, Baochen Qiang wrote:
>>>>
>>>>
>>>> On 6/30/2025 3:43 PM, Muhammad Usama Anjum wrote:
>>>>> Don't deinitialize and reinitialize the HAL helpers. The dma memory is
>>>>> deallocated and there is high possibility that we'll not be able to get
>>>>> the same memory allocated from dma when there is high memory pressure.
>>>>>
>>>>> Tested-on: WCN6855 WLAN.HSP.1.1-03926.13-QCAHSPSWPL_V2_SILICONZ_CE-2.52297.6
>>>>>
>>>>> Fixes: d5c65159f289 ("ath11k: driver for Qualcomm IEEE 802.11ax devices")
>>>>> Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
>>>>> ---
>>>>>  drivers/net/wireless/ath/ath11k/core.c | 5 -----
>>>>>  1 file changed, 5 deletions(-)
>>>>>
>>>>> diff --git a/drivers/net/wireless/ath/ath11k/core.c b/drivers/net/wireless/ath/ath11k/core.c
>>>>> index 4488e4cdc5e9e..bc4930fe6a367 100644
>>>>> --- a/drivers/net/wireless/ath/ath11k/core.c
>>>>> +++ b/drivers/net/wireless/ath/ath11k/core.c
>>>>> @@ -2213,14 +2213,9 @@ static int ath11k_core_reconfigure_on_crash(struct ath11k_base *ab)
>>>>>  	mutex_unlock(&ab->core_lock);
>>>>>  
>>>>>  	ath11k_dp_free(ab);
>>>>> -	ath11k_hal_srng_deinit(ab);
>>>>>  
>>>>>  	ab->free_vdev_map = (1LL << (ab->num_radios * TARGET_NUM_VDEVS(ab))) - 1;
>>>>>  
>>>>> -	ret = ath11k_hal_srng_init(ab);
>>>>> -	if (ret)
>>>>> -		return ret;
>>>>> -
>>>>
>>>> while I agree there is no need of a dealloc/realloc, we can not simply remove calling the
>>>> _deinit()/_init() pair. At least the memset() cleanup to hal parameters (e.g.
>>> Why do is it being done in the resume handler? Shouldn't those parameters be cleaned up
>>> in resume handler? So when device wakes up, its state is already correct.
>>>
>>
>> Hmm... not quite understand your question. Can you elaborate?
> 
> I'm trying to understand the possibility of cleanup of hal in suspend handler. For example:
> * The driver has been loaded and has been working fine.
> * The user called suspend. So all devices would be suspended.
> * In suspend handler of the ath11k, we should do the necessary cleanups of the states
>   like hal.
> * When the device would resume after long time, the hal would have the correct state
>   already. So we'll not need to deinit and init again.

The hal cleanup is not only needed by suspend/resume, but also a step of reset/recover
process. So If we are moving the cleanup to suspend handler, similar stuff needs to be
done for reset/recover as well.

> 
>>
>>> I'm not sure why it worked every time when I tested it on my device.
>>>
>>>> avail_blk_resource, current_blk_index and num_shadow_reg_configured etc.) inside the
>>>> _init() needs to be kept as the later operation needs a clean state of them.
>>> So should we just memset these 3?
>>
>> more than them I think. We need to take care of all entries in hal since current code is
>> memset them all.
> I see. So memset the whole ath11k hal structure other than the config.
> 
>>
>>>
>>>
>>>>
>>>>>  	clear_bit(ATH11K_FLAG_CRASH_FLUSH, &ab->dev_flags);
>>>>>  
>>>>>  	ret = ath11k_core_qmi_firmware_ready(ab);
>>>>
>>>> the _deinit() is still getting called in case ath11k_core_qmi_firmware_ready() fails,
>>>> making it a little odd since there is no _init() anymore with this change, though this is
>>>> the way of current logic (I mean the hal is currently deinit in the error path).
>>>>
>>>> Thinking it more, if we hit the error path, seems the only way is to remove ath11k module.
>>>> In that case the _deinit() would be called again in ath11k_pci_remove(), leading to issues
>>>> (at least I see a double free of hal->srng_config). But this is another topic which can be
>>>> fixed in a separate patch.
>>>
>>> I don't think this is the problem as HAL is already initialized when before the system has
>>> suspended. So by removing deinit() and init() pairs, the HAL still remains initialized. Or
>>> maybe I've missed something?
>>
>> Yeah, it is OK in normal path. However in error path we face issues.
> For example:
> * When driver was initialized the first time, the hal was init.
> * Then system was suspended and hal doesn't get deinit.
> * At system resume, the hal is already init. We can memset some status variables. But its
>   initialized already from the first time. (considering this patch that deinit/init have
>   been removed)
> * So at this stage if some error occurs and we can call the deinit in the error paths.
> 
> 


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/3] bus: mhi: don't deinitialize and re-initialize again
  2025-07-07  9:00           ` Baochen Qiang
@ 2025-07-07 13:11             ` Muhammad Usama Anjum
  2025-07-08  1:43               ` Baochen Qiang
  0 siblings, 1 reply; 27+ messages in thread
From: Muhammad Usama Anjum @ 2025-07-07 13:11 UTC (permalink / raw)
  To: Baochen Qiang, Manivannan Sadhasivam, Jeff Johnson, Jeff Hugo,
	Youssef Samir, Matthew Leung, Yan Zhen, Alexander Wilhelm,
	Alex Elder, Kunwu Chan, Greg Kroah-Hartman, Siddartha Mohanadoss,
	Sujeev Dias, Julia Lawall, John Crispin, Muna Sinada,
	Venkateswara Naralasetty, Maharaja Kennadyrajan, mhi,
	linux-arm-msm, linux-kernel, linux-wireless, ath11k
  Cc: kernel

On 7/7/25 2:00 PM, Baochen Qiang wrote:
> 
> 
> On 7/7/2025 4:19 PM, Muhammad Usama Anjum wrote:
>> On 7/3/25 6:59 AM, Baochen Qiang wrote:
>>>
>>>
>>> On 7/3/2025 12:12 AM, Muhammad Usama Anjum wrote:
>>>> Thank you for reviewing.
>>>>
>>>> On 7/2/25 8:50 AM, Baochen Qiang wrote:
>>>>>
>>>>>
>>>>> On 6/30/2025 3:43 PM, Muhammad Usama Anjum wrote:
>>>>>> Don't deinitialize and reinitialize the HAL helpers. The dma memory is
>>>>>> deallocated and there is high possibility that we'll not be able to get
>>>>>> the same memory allocated from dma when there is high memory pressure.
>>>>>>
>>>>>> Tested-on: WCN6855 WLAN.HSP.1.1-03926.13-QCAHSPSWPL_V2_SILICONZ_CE-2.52297.6
>>>>>>
>>>>>> Fixes: d5c65159f289 ("ath11k: driver for Qualcomm IEEE 802.11ax devices")
>>>>>> Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
>>>>>> ---
>>>>>>  drivers/net/wireless/ath/ath11k/core.c | 5 -----
>>>>>>  1 file changed, 5 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/net/wireless/ath/ath11k/core.c b/drivers/net/wireless/ath/ath11k/core.c
>>>>>> index 4488e4cdc5e9e..bc4930fe6a367 100644
>>>>>> --- a/drivers/net/wireless/ath/ath11k/core.c
>>>>>> +++ b/drivers/net/wireless/ath/ath11k/core.c
>>>>>> @@ -2213,14 +2213,9 @@ static int ath11k_core_reconfigure_on_crash(struct ath11k_base *ab)
>>>>>>  	mutex_unlock(&ab->core_lock);
>>>>>>  
>>>>>>  	ath11k_dp_free(ab);
>>>>>> -	ath11k_hal_srng_deinit(ab);
>>>>>>  
>>>>>>  	ab->free_vdev_map = (1LL << (ab->num_radios * TARGET_NUM_VDEVS(ab))) - 1;
>>>>>>  
>>>>>> -	ret = ath11k_hal_srng_init(ab);
>>>>>> -	if (ret)
>>>>>> -		return ret;
>>>>>> -
>>>>>
>>>>> while I agree there is no need of a dealloc/realloc, we can not simply remove calling the
>>>>> _deinit()/_init() pair. At least the memset() cleanup to hal parameters (e.g.
>>>> Why do is it being done in the resume handler? Shouldn't those parameters be cleaned up
>>>> in resume handler? So when device wakes up, its state is already correct.
>>>>
>>>
>>> Hmm... not quite understand your question. Can you elaborate?
>>
>> I'm trying to understand the possibility of cleanup of hal in suspend handler. For example:
>> * The driver has been loaded and has been working fine.
>> * The user called suspend. So all devices would be suspended.
>> * In suspend handler of the ath11k, we should do the necessary cleanups of the states
>>   like hal.
>> * When the device would resume after long time, the hal would have the correct state
>>   already. So we'll not need to deinit and init again.
> 
> The hal cleanup is not only needed by suspend/resume, but also a step of reset/recover
> process. So If we are moving the cleanup to suspend handler, similar stuff needs to be
> done for reset/recover as well.
It makes sense.

So clearing the hal structure completely other than ab->hal.srn_config doesn't seem
right. I've also tested it and it crashes the whole system.

On contrary, with only the current patch applied, there is no abnormality.

num_shadow_reg_configured and avail_blk_resource are non-zero. If I make them 0,
driver still keeps on working.

	ab->hal.num_shadow_reg_configured = 0;
	ab->hal.avail_blk_resource = 0;
	ab->hal.current_blk_index = 0;

As you have suggested setting these 3 to zero, is there any other variable in hal
structure which should be set to zero?
> 
>>
>>>
>>>> I'm not sure why it worked every time when I tested it on my device.
>>>>
>>>>> avail_blk_resource, current_blk_index and num_shadow_reg_configured etc.) inside the
>>>>> _init() needs to be kept as the later operation needs a clean state of them.
>>>> So should we just memset these 3?
>>>
>>> more than them I think. We need to take care of all entries in hal since current code is
>>> memset them all.
>> I see. So memset the whole ath11k hal structure other than the config.
>>
>>>
>>>>
>>>>
>>>>>
>>>>>>  	clear_bit(ATH11K_FLAG_CRASH_FLUSH, &ab->dev_flags);
>>>>>>  
>>>>>>  	ret = ath11k_core_qmi_firmware_ready(ab);
>>>>>
>>>>> the _deinit() is still getting called in case ath11k_core_qmi_firmware_ready() fails,
>>>>> making it a little odd since there is no _init() anymore with this change, though this is
>>>>> the way of current logic (I mean the hal is currently deinit in the error path).
>>>>>
>>>>> Thinking it more, if we hit the error path, seems the only way is to remove ath11k module.
>>>>> In that case the _deinit() would be called again in ath11k_pci_remove(), leading to issues
>>>>> (at least I see a double free of hal->srng_config). But this is another topic which can be
>>>>> fixed in a separate patch.
>>>>
>>>> I don't think this is the problem as HAL is already initialized when before the system has
>>>> suspended. So by removing deinit() and init() pairs, the HAL still remains initialized. Or
>>>> maybe I've missed something?
>>>
>>> Yeah, it is OK in normal path. However in error path we face issues.
>> For example:
>> * When driver was initialized the first time, the hal was init.
>> * Then system was suspended and hal doesn't get deinit.
>> * At system resume, the hal is already init. We can memset some status variables. But its
>>   initialized already from the first time. (considering this patch that deinit/init have
>>   been removed)
>> * So at this stage if some error occurs and we can call the deinit in the error paths.
>>
>>
> 


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/3] bus: mhi: don't deinitialize and re-initialize again
  2025-07-07 13:11             ` Muhammad Usama Anjum
@ 2025-07-08  1:43               ` Baochen Qiang
  2025-07-08  9:12                 ` Muhammad Usama Anjum
  0 siblings, 1 reply; 27+ messages in thread
From: Baochen Qiang @ 2025-07-08  1:43 UTC (permalink / raw)
  To: Muhammad Usama Anjum, Manivannan Sadhasivam, Jeff Johnson,
	Jeff Hugo, Youssef Samir, Matthew Leung, Yan Zhen,
	Alexander Wilhelm, Alex Elder, Kunwu Chan, Greg Kroah-Hartman,
	Siddartha Mohanadoss, Sujeev Dias, Julia Lawall, John Crispin,
	Muna Sinada, Venkateswara Naralasetty, Maharaja Kennadyrajan, mhi,
	linux-arm-msm, linux-kernel, linux-wireless, ath11k
  Cc: kernel



On 7/7/2025 9:11 PM, Muhammad Usama Anjum wrote:
>>>>>>> diff --git a/drivers/net/wireless/ath/ath11k/core.c b/drivers/net/wireless/ath/ath11k/core.c
>>>>>>> index 4488e4cdc5e9e..bc4930fe6a367 100644
>>>>>>> --- a/drivers/net/wireless/ath/ath11k/core.c
>>>>>>> +++ b/drivers/net/wireless/ath/ath11k/core.c
>>>>>>> @@ -2213,14 +2213,9 @@ static int ath11k_core_reconfigure_on_crash(struct ath11k_base *ab)
>>>>>>>  	mutex_unlock(&ab->core_lock);
>>>>>>>  
>>>>>>>  	ath11k_dp_free(ab);
>>>>>>> -	ath11k_hal_srng_deinit(ab);
>>>>>>>  
>>>>>>>  	ab->free_vdev_map = (1LL << (ab->num_radios * TARGET_NUM_VDEVS(ab))) - 1;
>>>>>>>  
>>>>>>> -	ret = ath11k_hal_srng_init(ab);
>>>>>>> -	if (ret)
>>>>>>> -		return ret;
>>>>>>> -
>>>>>> while I agree there is no need of a dealloc/realloc, we can not simply remove calling the
>>>>>> _deinit()/_init() pair. At least the memset() cleanup to hal parameters (e.g.
>>>>> Why do is it being done in the resume handler? Shouldn't those parameters be cleaned up
>>>>> in resume handler? So when device wakes up, its state is already correct.
>>>>>
>>>> Hmm... not quite understand your question. Can you elaborate?
>>> I'm trying to understand the possibility of cleanup of hal in suspend handler. For example:
>>> * The driver has been loaded and has been working fine.
>>> * The user called suspend. So all devices would be suspended.
>>> * In suspend handler of the ath11k, we should do the necessary cleanups of the states
>>>   like hal.
>>> * When the device would resume after long time, the hal would have the correct state
>>>   already. So we'll not need to deinit and init again.
>> The hal cleanup is not only needed by suspend/resume, but also a step of reset/recover
>> process. So If we are moving the cleanup to suspend handler, similar stuff needs to be
>> done for reset/recover as well.
> It makes sense.
> 
> So clearing the hal structure completely other than ab->hal.srn_config doesn't seem
> right. I've also tested it and it crashes the whole system.
> 
> On contrary, with only the current patch applied, there is no abnormality.
> 
> num_shadow_reg_configured and avail_blk_resource are non-zero. If I make them 0,
> driver still keeps on working.
> 
> 	ab->hal.num_shadow_reg_configured = 0;
> 	ab->hal.avail_blk_resource = 0;
> 	ab->hal.current_blk_index = 0;
> 
> As you have suggested setting these 3 to zero, is there any other variable in hal
> structure which should be set to zero?

IMO srng_config, rdp, wrp and srng_key may keep unchanged through suspend/reset, all other
fields should be cleared/reinitialized.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/3] bus: mhi: don't deinitialize and re-initialize again
  2025-07-08  1:43               ` Baochen Qiang
@ 2025-07-08  9:12                 ` Muhammad Usama Anjum
  2025-07-08 10:38                   ` Baochen Qiang
  0 siblings, 1 reply; 27+ messages in thread
From: Muhammad Usama Anjum @ 2025-07-08  9:12 UTC (permalink / raw)
  To: Baochen Qiang, Manivannan Sadhasivam, Jeff Johnson, Jeff Hugo,
	Youssef Samir, Matthew Leung, Yan Zhen, Alexander Wilhelm,
	Alex Elder, Kunwu Chan, Greg Kroah-Hartman, Siddartha Mohanadoss,
	Sujeev Dias, Julia Lawall, John Crispin, Muna Sinada,
	Venkateswara Naralasetty, Maharaja Kennadyrajan, mhi,
	linux-arm-msm, linux-kernel, linux-wireless, ath11k
  Cc: kernel

On 7/8/25 6:43 AM, Baochen Qiang wrote:
> 
> 
> On 7/7/2025 9:11 PM, Muhammad Usama Anjum wrote:
>>>>>>>> diff --git a/drivers/net/wireless/ath/ath11k/core.c b/drivers/net/wireless/ath/ath11k/core.c
>>>>>>>> index 4488e4cdc5e9e..bc4930fe6a367 100644
>>>>>>>> --- a/drivers/net/wireless/ath/ath11k/core.c
>>>>>>>> +++ b/drivers/net/wireless/ath/ath11k/core.c
>>>>>>>> @@ -2213,14 +2213,9 @@ static int ath11k_core_reconfigure_on_crash(struct ath11k_base *ab)
>>>>>>>>  	mutex_unlock(&ab->core_lock);
>>>>>>>>  
>>>>>>>>  	ath11k_dp_free(ab);
>>>>>>>> -	ath11k_hal_srng_deinit(ab);
>>>>>>>>  
>>>>>>>>  	ab->free_vdev_map = (1LL << (ab->num_radios * TARGET_NUM_VDEVS(ab))) - 1;
>>>>>>>>  
>>>>>>>> -	ret = ath11k_hal_srng_init(ab);
>>>>>>>> -	if (ret)
>>>>>>>> -		return ret;
>>>>>>>> -
>>>>>>> while I agree there is no need of a dealloc/realloc, we can not simply remove calling the
>>>>>>> _deinit()/_init() pair. At least the memset() cleanup to hal parameters (e.g.
>>>>>> Why do is it being done in the resume handler? Shouldn't those parameters be cleaned up
>>>>>> in resume handler? So when device wakes up, its state is already correct.
>>>>>>
>>>>> Hmm... not quite understand your question. Can you elaborate?
>>>> I'm trying to understand the possibility of cleanup of hal in suspend handler. For example:
>>>> * The driver has been loaded and has been working fine.
>>>> * The user called suspend. So all devices would be suspended.
>>>> * In suspend handler of the ath11k, we should do the necessary cleanups of the states
>>>>   like hal.
>>>> * When the device would resume after long time, the hal would have the correct state
>>>>   already. So we'll not need to deinit and init again.
>>> The hal cleanup is not only needed by suspend/resume, but also a step of reset/recover
>>> process. So If we are moving the cleanup to suspend handler, similar stuff needs to be
>>> done for reset/recover as well.
>> It makes sense.
>>
>> So clearing the hal structure completely other than ab->hal.srn_config doesn't seem
>> right. I've also tested it and it crashes the whole system.
>>
>> On contrary, with only the current patch applied, there is no abnormality.
>>
>> num_shadow_reg_configured and avail_blk_resource are non-zero. If I make them 0,
>> driver still keeps on working.
>>
>> 	ab->hal.num_shadow_reg_configured = 0;
>> 	ab->hal.avail_blk_resource = 0;
>> 	ab->hal.current_blk_index = 0;
>>
>> As you have suggested setting these 3 to zero, is there any other variable in hal
>> structure which should be set to zero?
> 
> IMO srng_config, rdp, wrp and srng_key may keep unchanged through suspend/reset, all other
> fields should be cleared/reinitialized.

memseting srng_list and shadow_reg_addr causes crashes. Please can you confirm why do you
think those should be memset. Here is WIP patch:


--- a/drivers/net/wireless/ath/ath11k/core.c
+++ b/drivers/net/wireless/ath/ath11k/core.c
@@ -2213,14 +2213,10 @@ static int ath11k_core_reconfigure_on_crash(struct ath11k_base *ab)
 	mutex_unlock(&ab->core_lock);
 
 	ath11k_dp_free(ab);
-	ath11k_hal_srng_deinit(ab);
+	ath11k_hal_srng_clear(ab);
 
 	ab->free_vdev_map = (1LL << (ab->num_radios * TARGET_NUM_VDEVS(ab))) - 1;
 
-	ret = ath11k_hal_srng_init(ab);
-	if (ret)
-		return ret;
-
 	clear_bit(ATH11K_FLAG_CRASH_FLUSH, &ab->dev_flags);
 
 	ret = ath11k_core_qmi_firmware_ready(ab);
diff --git a/drivers/net/wireless/ath/ath11k/hal.c b/drivers/net/wireless/ath/ath11k/hal.c
index b32de563d453a..d4be040acf2c8 100644
--- a/drivers/net/wireless/ath/ath11k/hal.c
+++ b/drivers/net/wireless/ath/ath11k/hal.c
@@ -1359,6 +1359,19 @@ void ath11k_hal_srng_deinit(struct ath11k_base *ab)
 }
 EXPORT_SYMBOL(ath11k_hal_srng_deinit);
 
+void ath11k_hal_srng_clear(struct ath11k_base *ab)
+{
+// --> both of these memset causes crashes
+//	memset(ab->hal.srng_list, 0,
+//	       sizeof(ab->hal.srng_list) * HAL_SRNG_RING_ID_MAX);
+//	memset(ab->hal.shadow_reg_addr, 0,
+//	       sizeof(ab->hal.shadow_reg_addr) * HAL_SHADOW_NUM_REGS);
+	ab->hal.avail_blk_resource = 0;
+	ab->hal.current_blk_index = 0;
+	ab->hal.num_shadow_reg_configured = 0;
+}
+EXPORT_SYMBOL(ath11k_hal_srng_clear);
+
 void ath11k_hal_dump_srng_stats(struct ath11k_base *ab)
 {
 	struct hal_srng *srng;
diff --git a/drivers/net/wireless/ath/ath11k/hal.h b/drivers/net/wireless/ath/ath11k/hal.h
index 601542410c752..839095af9267e 100644
--- a/drivers/net/wireless/ath/ath11k/hal.h
+++ b/drivers/net/wireless/ath/ath11k/hal.h
@@ -965,6 +965,7 @@ int ath11k_hal_srng_setup(struct ath11k_base *ab, enum hal_ring_type type,
 			  struct hal_srng_params *params);
 int ath11k_hal_srng_init(struct ath11k_base *ath11k);
 void ath11k_hal_srng_deinit(struct ath11k_base *ath11k);
+void ath11k_hal_srng_clear(struct ath11k_base *ab);
 void ath11k_hal_dump_srng_stats(struct ath11k_base *ab);
 void ath11k_hal_srng_get_shadow_config(struct ath11k_base *ab,
 				       u32 **cfg, u32 *len);



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/3] bus: mhi: host: keep bhi buffer through suspend cycle
  2025-06-30  7:43 ` [PATCH 1/3] bus: mhi: host: keep bhi buffer through suspend cycle Muhammad Usama Anjum
  2025-07-01 10:25   ` Greg Kroah-Hartman
@ 2025-07-08  9:47   ` Krishna Chaitanya Chundru
  2025-07-10 14:14     ` Muhammad Usama Anjum
  1 sibling, 1 reply; 27+ messages in thread
From: Krishna Chaitanya Chundru @ 2025-07-08  9:47 UTC (permalink / raw)
  To: Muhammad Usama Anjum, Manivannan Sadhasivam, Jeff Johnson,
	Jeff Hugo, Youssef Samir, Matthew Leung, Yan Zhen,
	Alexander Wilhelm, Alex Elder, Kunwu Chan, Greg Kroah-Hartman,
	Siddartha Mohanadoss, Sujeev Dias, Julia Lawall, John Crispin,
	Muna Sinada, Venkateswara Naralasetty, Maharaja Kennadyrajan, mhi,
	linux-arm-msm, linux-kernel, linux-wireless, ath11k
  Cc: kernel



On 6/30/2025 1:13 PM, Muhammad Usama Anjum wrote:
> When there is memory pressure, at resume time dma_alloc_coherent()
> returns error which in turn fails the loading of firmware and hence
> the driver crashes:
> 
why only bhi? bhie can also have same issue.
> kernel: kworker/u33:5: page allocation failure: order:7,
> mode:0xc04(GFP_NOIO|GFP_DMA32), nodemask=(null),cpuset=/,mems_allowed=0
> kernel: CPU: 1 UID: 0 PID: 7693 Comm: kworker/u33:5 Not tainted
> 6.11.11-valve17-1-neptune-611-g027868a0ac03 #1
> 3843143b92e9da0fa2d3d5f21f51beaed15c7d59
> kernel: Hardware name: Valve Galileo/Galileo, BIOS F7G0112 08/01/2024
> kernel: Workqueue: mhi_hiprio_wq mhi_pm_st_worker [mhi]
> kernel: Call Trace:
> kernel:  <TASK>
> kernel:  dump_stack_lvl+0x4e/0x70
> kernel:  warn_alloc+0x164/0x190
> kernel:  ? srso_return_thunk+0x5/0x5f
> kernel:  ? __alloc_pages_direct_compact+0xaf/0x360
> kernel:  __alloc_pages_slowpath.constprop.0+0xc75/0xd70
> kernel:  __alloc_pages_noprof+0x321/0x350
> kernel:  __dma_direct_alloc_pages.isra.0+0x14a/0x290
> kernel:  dma_direct_alloc+0x70/0x270
> kernel:  mhi_fw_load_handler+0x126/0x340 [mhi
> a96cb91daba500cc77f86bad60c1f332dc3babdf]
> kernel:  mhi_pm_st_worker+0x5e8/0xac0 [mhi
> a96cb91daba500cc77f86bad60c1f332dc3babdf]
> kernel:  ? srso_return_thunk+0x5/0x5f
> kernel:  process_one_work+0x17e/0x330
> kernel:  worker_thread+0x2ce/0x3f0
> kernel:  ? __pfx_worker_thread+0x10/0x10
> kernel:  kthread+0xd2/0x100
> kernel:  ? __pfx_kthread+0x10/0x10
> kernel:  ret_from_fork+0x34/0x50
> kernel:  ? __pfx_kthread+0x10/0x10
> kernel:  ret_from_fork_asm+0x1a/0x30
> kernel:  </TASK>
> kernel: Mem-Info:
> kernel: active_anon:513809 inactive_anon:152 isolated_anon:0
>      active_file:359315 inactive_file:2487001 isolated_file:0
>      unevictable:637 dirty:19 writeback:0
>      slab_reclaimable:160391 slab_unreclaimable:39729
>      mapped:175836 shmem:51039 pagetables:4415
>      sec_pagetables:0 bounce:0
>      kernel_misc_reclaimable:0
>      free:125666 free_pcp:0 free_cma:0
> 
> In above example, if we sum all the consumed memory, it comes out
> to be 15.5GB and free memory is ~ 500MB from a total of 16GB RAM.
> Even though memory is present. But all of the dma memory has been
> exhausted or fragmented.
> 
> Fix it by allocating it only once and then reuse the same allocated
> memory. As we'll allocate this memory only once, this memory will stay
> allocated.
> 
> Tested-on: WCN6855 WLAN.HSP.1.1-03926.13-QCAHSPSWPL_V2_SILICONZ_CE-2.52297.6
> 
> Fixes: cd457afb1667 ("bus: mhi: core: Add support for downloading firmware over BHIe")
> Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
> ---
> Reported here:
> https://lore.kernel.org/all/ead32f5b-730a-4b81-b38f-93d822f990c6@collabora.com
> 
> Still a lot of more fixes are required. Hence, I'm not adding closes tag.
> ---
>   drivers/bus/mhi/host/boot.c     | 19 ++++++++++---------
>   drivers/bus/mhi/host/init.c     |  5 +++++
>   drivers/bus/mhi/host/internal.h |  2 ++
>   include/linux/mhi.h             |  1 +
>   4 files changed, 18 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/bus/mhi/host/boot.c b/drivers/bus/mhi/host/boot.c
> index b3a85aa3c4768..11bb8c12ac597 100644
> --- a/drivers/bus/mhi/host/boot.c
> +++ b/drivers/bus/mhi/host/boot.c
> @@ -302,8 +302,8 @@ static int mhi_fw_load_bhi(struct mhi_controller *mhi_cntrl,
>   	return -EIO;
>   }
>   
> -static void mhi_free_bhi_buffer(struct mhi_controller *mhi_cntrl,
> -				struct image_info *image_info)
> +void mhi_free_bhi_buffer(struct mhi_controller *mhi_cntrl,
> +			 struct image_info *image_info)
>   {
>   	struct mhi_buf *mhi_buf = image_info->mhi_buf;
>   
> @@ -455,18 +455,19 @@ static enum mhi_fw_load_type mhi_fw_load_type_get(const struct mhi_controller *m
>   
>   static int mhi_load_image_bhi(struct mhi_controller *mhi_cntrl, const u8 *fw_data, size_t size)
>   {
> -	struct image_info *image;
> +	struct image_info *image = mhi_cntrl->bhi_image;
>   	int ret;
>   
> -	ret = mhi_alloc_bhi_buffer(mhi_cntrl, &image, size);
> -	if (ret)
> -		return ret;
> +	if (!image) {
> +		ret = mhi_alloc_bhi_buffer(mhi_cntrl, &image, size);
> +		if (ret)
> +			return ret;
>   
> -	/* Load the firmware into BHI vec table */
> -	memcpy(image->mhi_buf->buf, fw_data, size);
> +		/* Load the firmware into BHI vec table */
> +		memcpy(image->mhi_buf->buf, fw_data, size);
> +	}
>   
>   	ret = mhi_fw_load_bhi(mhi_cntrl, &image->mhi_buf[image->entries - 1]);
if mhi fw load fails didn't we need to free bhi buffer.

- Krishna Chaitanya.
> -	mhi_free_bhi_buffer(mhi_cntrl, image);
>   
>   	return ret;
>   }
> diff --git a/drivers/bus/mhi/host/init.c b/drivers/bus/mhi/host/init.c
> index 6e06e4efec765..2e0f18c939e68 100644
> --- a/drivers/bus/mhi/host/init.c
> +++ b/drivers/bus/mhi/host/init.c
> @@ -1228,6 +1228,11 @@ void mhi_unprepare_after_power_down(struct mhi_controller *mhi_cntrl)
>   		mhi_cntrl->rddm_image = NULL;
>   	}
>   
> +	if (mhi_cntrl->bhi_image) {
> +		mhi_free_bhi_buffer(mhi_cntrl, mhi_cntrl->bhi_image);
> +		mhi_cntrl->bhi_image = NULL;
> +	}
> +
>   	mhi_deinit_dev_ctxt(mhi_cntrl);
>   }
>   EXPORT_SYMBOL_GPL(mhi_unprepare_after_power_down);
> diff --git a/drivers/bus/mhi/host/internal.h b/drivers/bus/mhi/host/internal.h
> index 1054e67bb450d..60b0699323375 100644
> --- a/drivers/bus/mhi/host/internal.h
> +++ b/drivers/bus/mhi/host/internal.h
> @@ -324,6 +324,8 @@ int mhi_alloc_bhie_table(struct mhi_controller *mhi_cntrl,
>   			 struct image_info **image_info, size_t alloc_size);
>   void mhi_free_bhie_table(struct mhi_controller *mhi_cntrl,
>   			 struct image_info *image_info);
> +void mhi_free_bhi_buffer(struct mhi_controller *mhi_cntrl,
> +			 struct image_info *image_info);
>   
>   /* Power management APIs */
>   enum mhi_pm_state __must_check mhi_tryset_pm_state(
> diff --git a/include/linux/mhi.h b/include/linux/mhi.h
> index 4c567907933a5..593012f779d97 100644
> --- a/include/linux/mhi.h
> +++ b/include/linux/mhi.h
> @@ -391,6 +391,7 @@ struct mhi_controller {
>   	size_t reg_len;
>   	struct image_info *fbc_image;
>   	struct image_info *rddm_image;
> +	struct image_info *bhi_image;
>   	struct mhi_chan *mhi_chan;
>   	struct list_head lpm_chans;
>   	int *irq;

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/3] bus: mhi: keep device context through suspend cycles
  2025-06-30  7:43 ` [PATCH 3/3] bus: mhi: keep device context through suspend cycles Muhammad Usama Anjum
  2025-07-01 10:26   ` Greg Kroah-Hartman
@ 2025-07-08 10:15   ` Krishna Chaitanya Chundru
  2025-07-10 15:07     ` Muhammad Usama Anjum
  1 sibling, 1 reply; 27+ messages in thread
From: Krishna Chaitanya Chundru @ 2025-07-08 10:15 UTC (permalink / raw)
  To: Muhammad Usama Anjum, Manivannan Sadhasivam, Jeff Johnson,
	Jeff Hugo, Youssef Samir, Matthew Leung, Yan Zhen,
	Alexander Wilhelm, Alex Elder, Kunwu Chan, Greg Kroah-Hartman,
	Siddartha Mohanadoss, Sujeev Dias, Julia Lawall, John Crispin,
	Muna Sinada, Venkateswara Naralasetty, Maharaja Kennadyrajan, mhi,
	linux-arm-msm, linux-kernel, linux-wireless, ath11k
  Cc: kernel



On 6/30/2025 1:13 PM, Muhammad Usama Anjum wrote:
> Don't deinitialize the device context while going into suspend or
> hibernation cycles. Otherwise the resume may fail if at resume time, the
> memory pressure is high and no dma memory is available.
> 
> Tested-on: WCN6855 WLAN.HSP.1.1-03926.13-QCAHSPSWPL_V2_SILICONZ_CE-2.52297.6
> 
> Fixes: 3000f85b8f47 ("bus: mhi: core: Add support for basic PM operations")
> Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
> ---
>   drivers/bus/mhi/host/init.c | 11 ++++++-----
>   1 file changed, 6 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/bus/mhi/host/init.c b/drivers/bus/mhi/host/init.c
> index 2e0f18c939e68..8f56e73fdc42e 100644
> --- a/drivers/bus/mhi/host/init.c
> +++ b/drivers/bus/mhi/host/init.c
> @@ -1133,9 +1133,11 @@ int mhi_prepare_for_power_up(struct mhi_controller *mhi_cntrl)
>   
>   	mutex_lock(&mhi_cntrl->pm_mutex);
>   
> -	ret = mhi_init_dev_ctxt(mhi_cntrl);
mhi init dev ctxt also initializes the ring pointers to base value,
I think we should take care of them also ?

- Krishna Chaitanya.
> -	if (ret)
> -		goto error_dev_ctxt;
> +	if (!mhi_cntrl->mhi_ctxt) {
> +		ret = mhi_init_dev_ctxt(mhi_cntrl);
> +		if (ret)
> +			goto error_dev_ctxt;
> +	}
>   
>   	ret = mhi_read_reg(mhi_cntrl, mhi_cntrl->regs, BHIOFF, &bhi_off);
>   	if (ret) {
> @@ -1212,8 +1214,6 @@ void mhi_deinit_dev_ctxt(struct mhi_controller *mhi_cntrl)
>   {
>   	mhi_cntrl->bhi = NULL;
>   	mhi_cntrl->bhie = NULL;
> -
> -	__mhi_deinit_dev_ctxt(mhi_cntrl);
>   }
>   
>   void mhi_unprepare_after_power_down(struct mhi_controller *mhi_cntrl)
> @@ -1234,6 +1234,7 @@ void mhi_unprepare_after_power_down(struct mhi_controller *mhi_cntrl)
>   	}
>   
>   	mhi_deinit_dev_ctxt(mhi_cntrl);
> +	__mhi_deinit_dev_ctxt(mhi_cntrl);
>   }
>   EXPORT_SYMBOL_GPL(mhi_unprepare_after_power_down);
>   

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/3] bus: mhi: don't deinitialize and re-initialize again
  2025-07-08  9:12                 ` Muhammad Usama Anjum
@ 2025-07-08 10:38                   ` Baochen Qiang
  0 siblings, 0 replies; 27+ messages in thread
From: Baochen Qiang @ 2025-07-08 10:38 UTC (permalink / raw)
  To: Muhammad Usama Anjum, Manivannan Sadhasivam, Jeff Johnson,
	Jeff Hugo, Youssef Samir, Matthew Leung, Yan Zhen,
	Alexander Wilhelm, Alex Elder, Kunwu Chan, Greg Kroah-Hartman,
	Siddartha Mohanadoss, Sujeev Dias, Julia Lawall, John Crispin,
	Muna Sinada, Venkateswara Naralasetty, Maharaja Kennadyrajan, mhi,
	linux-arm-msm, linux-kernel, linux-wireless, ath11k
  Cc: kernel



On 7/8/2025 5:12 PM, Muhammad Usama Anjum wrote:
> On 7/8/25 6:43 AM, Baochen Qiang wrote:
>>
>>
>> On 7/7/2025 9:11 PM, Muhammad Usama Anjum wrote:
>>>>>>>>> diff --git a/drivers/net/wireless/ath/ath11k/core.c b/drivers/net/wireless/ath/ath11k/core.c
>>>>>>>>> index 4488e4cdc5e9e..bc4930fe6a367 100644
>>>>>>>>> --- a/drivers/net/wireless/ath/ath11k/core.c
>>>>>>>>> +++ b/drivers/net/wireless/ath/ath11k/core.c
>>>>>>>>> @@ -2213,14 +2213,9 @@ static int ath11k_core_reconfigure_on_crash(struct ath11k_base *ab)
>>>>>>>>>  	mutex_unlock(&ab->core_lock);
>>>>>>>>>  
>>>>>>>>>  	ath11k_dp_free(ab);
>>>>>>>>> -	ath11k_hal_srng_deinit(ab);
>>>>>>>>>  
>>>>>>>>>  	ab->free_vdev_map = (1LL << (ab->num_radios * TARGET_NUM_VDEVS(ab))) - 1;
>>>>>>>>>  
>>>>>>>>> -	ret = ath11k_hal_srng_init(ab);
>>>>>>>>> -	if (ret)
>>>>>>>>> -		return ret;
>>>>>>>>> -
>>>>>>>> while I agree there is no need of a dealloc/realloc, we can not simply remove calling the
>>>>>>>> _deinit()/_init() pair. At least the memset() cleanup to hal parameters (e.g.
>>>>>>> Why do is it being done in the resume handler? Shouldn't those parameters be cleaned up
>>>>>>> in resume handler? So when device wakes up, its state is already correct.
>>>>>>>
>>>>>> Hmm... not quite understand your question. Can you elaborate?
>>>>> I'm trying to understand the possibility of cleanup of hal in suspend handler. For example:
>>>>> * The driver has been loaded and has been working fine.
>>>>> * The user called suspend. So all devices would be suspended.
>>>>> * In suspend handler of the ath11k, we should do the necessary cleanups of the states
>>>>>   like hal.
>>>>> * When the device would resume after long time, the hal would have the correct state
>>>>>   already. So we'll not need to deinit and init again.
>>>> The hal cleanup is not only needed by suspend/resume, but also a step of reset/recover
>>>> process. So If we are moving the cleanup to suspend handler, similar stuff needs to be
>>>> done for reset/recover as well.
>>> It makes sense.
>>>
>>> So clearing the hal structure completely other than ab->hal.srn_config doesn't seem
>>> right. I've also tested it and it crashes the whole system.
>>>
>>> On contrary, with only the current patch applied, there is no abnormality.
>>>
>>> num_shadow_reg_configured and avail_blk_resource are non-zero. If I make them 0,
>>> driver still keeps on working.
>>>
>>> 	ab->hal.num_shadow_reg_configured = 0;
>>> 	ab->hal.avail_blk_resource = 0;
>>> 	ab->hal.current_blk_index = 0;
>>>
>>> As you have suggested setting these 3 to zero, is there any other variable in hal
>>> structure which should be set to zero?
>>
>> IMO srng_config, rdp, wrp and srng_key may keep unchanged through suspend/reset, all other
>> fields should be cleared/reinitialized.
> 
> memseting srng_list and shadow_reg_addr causes crashes. Please can you confirm why do you
> think those should be memset. Here is WIP patch:

We need to make sure they have a clean state while resume/recover.

> 
> 
> --- a/drivers/net/wireless/ath/ath11k/core.c
> +++ b/drivers/net/wireless/ath/ath11k/core.c
> @@ -2213,14 +2213,10 @@ static int ath11k_core_reconfigure_on_crash(struct ath11k_base *ab)
>  	mutex_unlock(&ab->core_lock);
>  
>  	ath11k_dp_free(ab);
> -	ath11k_hal_srng_deinit(ab);
> +	ath11k_hal_srng_clear(ab);
>  
>  	ab->free_vdev_map = (1LL << (ab->num_radios * TARGET_NUM_VDEVS(ab))) - 1;
>  
> -	ret = ath11k_hal_srng_init(ab);
> -	if (ret)
> -		return ret;
> -
>  	clear_bit(ATH11K_FLAG_CRASH_FLUSH, &ab->dev_flags);
>  
>  	ret = ath11k_core_qmi_firmware_ready(ab);
> diff --git a/drivers/net/wireless/ath/ath11k/hal.c b/drivers/net/wireless/ath/ath11k/hal.c
> index b32de563d453a..d4be040acf2c8 100644
> --- a/drivers/net/wireless/ath/ath11k/hal.c
> +++ b/drivers/net/wireless/ath/ath11k/hal.c
> @@ -1359,6 +1359,19 @@ void ath11k_hal_srng_deinit(struct ath11k_base *ab)
>  }
>  EXPORT_SYMBOL(ath11k_hal_srng_deinit);
>  
> +void ath11k_hal_srng_clear(struct ath11k_base *ab)
> +{
> +// --> both of these memset causes crashes
> +//	memset(ab->hal.srng_list, 0,
> +//	       sizeof(ab->hal.srng_list) * HAL_SRNG_RING_ID_MAX);

You are memset too much, just sizeof(ab->hal.srng_list) is OK.

> +//	memset(ab->hal.shadow_reg_addr, 0,
> +//	       sizeof(ab->hal.shadow_reg_addr) * HAL_SHADOW_NUM_REGS);

same here

> +	ab->hal.avail_blk_resource = 0;
> +	ab->hal.current_blk_index = 0;
> +	ab->hal.num_shadow_reg_configured = 0;
> +}
> +EXPORT_SYMBOL(ath11k_hal_srng_clear);
> +
>  void ath11k_hal_dump_srng_stats(struct ath11k_base *ab)
>  {
>  	struct hal_srng *srng;
> diff --git a/drivers/net/wireless/ath/ath11k/hal.h b/drivers/net/wireless/ath/ath11k/hal.h
> index 601542410c752..839095af9267e 100644
> --- a/drivers/net/wireless/ath/ath11k/hal.h
> +++ b/drivers/net/wireless/ath/ath11k/hal.h
> @@ -965,6 +965,7 @@ int ath11k_hal_srng_setup(struct ath11k_base *ab, enum hal_ring_type type,
>  			  struct hal_srng_params *params);
>  int ath11k_hal_srng_init(struct ath11k_base *ath11k);
>  void ath11k_hal_srng_deinit(struct ath11k_base *ath11k);
> +void ath11k_hal_srng_clear(struct ath11k_base *ab);
>  void ath11k_hal_dump_srng_stats(struct ath11k_base *ab);
>  void ath11k_hal_srng_get_shadow_config(struct ath11k_base *ab,
>  				       u32 **cfg, u32 *len);
> 
> 


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/3] bus: mhi: host: keep bhi buffer through suspend cycle
  2025-07-08  9:47   ` Krishna Chaitanya Chundru
@ 2025-07-10 14:14     ` Muhammad Usama Anjum
  0 siblings, 0 replies; 27+ messages in thread
From: Muhammad Usama Anjum @ 2025-07-10 14:14 UTC (permalink / raw)
  To: Krishna Chaitanya Chundru, Manivannan Sadhasivam, Jeff Johnson,
	Jeff Hugo, Youssef Samir, Matthew Leung, Yan Zhen,
	Alexander Wilhelm, Alex Elder, Kunwu Chan, Greg Kroah-Hartman,
	Siddartha Mohanadoss, Sujeev Dias, Julia Lawall, John Crispin,
	Muna Sinada, Venkateswara Naralasetty, Maharaja Kennadyrajan, mhi,
	linux-arm-msm, linux-kernel, linux-wireless, ath11k
  Cc: kernel

Thank you so much for review.

On 7/8/25 2:47 PM, Krishna Chaitanya Chundru wrote:
> 
> 
> On 6/30/2025 1:13 PM, Muhammad Usama Anjum wrote:
>> When there is memory pressure, at resume time dma_alloc_coherent()
>> returns error which in turn fails the loading of firmware and hence
>> the driver crashes:
>>
> why only bhi? bhie can also have same issue.
I was thinking I'd handled all bhie cases in earlier patch. But I haven't. I'll post
fix for bhie in next version.

>> kernel: kworker/u33:5: page allocation failure: order:7,
>> mode:0xc04(GFP_NOIO|GFP_DMA32), nodemask=(null),cpuset=/,mems_allowed=0
>> kernel: CPU: 1 UID: 0 PID: 7693 Comm: kworker/u33:5 Not tainted
>> 6.11.11-valve17-1-neptune-611-g027868a0ac03 #1
>> 3843143b92e9da0fa2d3d5f21f51beaed15c7d59
>> kernel: Hardware name: Valve Galileo/Galileo, BIOS F7G0112 08/01/2024
>> kernel: Workqueue: mhi_hiprio_wq mhi_pm_st_worker [mhi]
>> kernel: Call Trace:
>> kernel:  <TASK>
>> kernel:  dump_stack_lvl+0x4e/0x70
>> kernel:  warn_alloc+0x164/0x190
>> kernel:  ? srso_return_thunk+0x5/0x5f
>> kernel:  ? __alloc_pages_direct_compact+0xaf/0x360
>> kernel:  __alloc_pages_slowpath.constprop.0+0xc75/0xd70
>> kernel:  __alloc_pages_noprof+0x321/0x350
>> kernel:  __dma_direct_alloc_pages.isra.0+0x14a/0x290
>> kernel:  dma_direct_alloc+0x70/0x270
>> kernel:  mhi_fw_load_handler+0x126/0x340 [mhi
>> a96cb91daba500cc77f86bad60c1f332dc3babdf]
>> kernel:  mhi_pm_st_worker+0x5e8/0xac0 [mhi
>> a96cb91daba500cc77f86bad60c1f332dc3babdf]
>> kernel:  ? srso_return_thunk+0x5/0x5f
>> kernel:  process_one_work+0x17e/0x330
>> kernel:  worker_thread+0x2ce/0x3f0
>> kernel:  ? __pfx_worker_thread+0x10/0x10
>> kernel:  kthread+0xd2/0x100
>> kernel:  ? __pfx_kthread+0x10/0x10
>> kernel:  ret_from_fork+0x34/0x50
>> kernel:  ? __pfx_kthread+0x10/0x10
>> kernel:  ret_from_fork_asm+0x1a/0x30
>> kernel:  </TASK>
>> kernel: Mem-Info:
>> kernel: active_anon:513809 inactive_anon:152 isolated_anon:0
>>      active_file:359315 inactive_file:2487001 isolated_file:0
>>      unevictable:637 dirty:19 writeback:0
>>      slab_reclaimable:160391 slab_unreclaimable:39729
>>      mapped:175836 shmem:51039 pagetables:4415
>>      sec_pagetables:0 bounce:0
>>      kernel_misc_reclaimable:0
>>      free:125666 free_pcp:0 free_cma:0
>>
>> In above example, if we sum all the consumed memory, it comes out
>> to be 15.5GB and free memory is ~ 500MB from a total of 16GB RAM.
>> Even though memory is present. But all of the dma memory has been
>> exhausted or fragmented.
>>
>> Fix it by allocating it only once and then reuse the same allocated
>> memory. As we'll allocate this memory only once, this memory will stay
>> allocated.
>>
>> Tested-on: WCN6855 WLAN.HSP.1.1-03926.13-QCAHSPSWPL_V2_SILICONZ_CE-2.52297.6
>>
>> Fixes: cd457afb1667 ("bus: mhi: core: Add support for downloading firmware over BHIe")
>> Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
>> ---
>> Reported here:
>> https://lore.kernel.org/all/ead32f5b-730a-4b81-b38f-93d822f990c6@collabora.com
>>
>> Still a lot of more fixes are required. Hence, I'm not adding closes tag.
>> ---
>>   drivers/bus/mhi/host/boot.c     | 19 ++++++++++---------
>>   drivers/bus/mhi/host/init.c     |  5 +++++
>>   drivers/bus/mhi/host/internal.h |  2 ++
>>   include/linux/mhi.h             |  1 +
>>   4 files changed, 18 insertions(+), 9 deletions(-)
>>
>> diff --git a/drivers/bus/mhi/host/boot.c b/drivers/bus/mhi/host/boot.c
>> index b3a85aa3c4768..11bb8c12ac597 100644
>> --- a/drivers/bus/mhi/host/boot.c
>> +++ b/drivers/bus/mhi/host/boot.c
>> @@ -302,8 +302,8 @@ static int mhi_fw_load_bhi(struct mhi_controller *mhi_cntrl,
>>       return -EIO;
>>   }
>>   -static void mhi_free_bhi_buffer(struct mhi_controller *mhi_cntrl,
>> -                struct image_info *image_info)
>> +void mhi_free_bhi_buffer(struct mhi_controller *mhi_cntrl,
>> +             struct image_info *image_info)
>>   {
>>       struct mhi_buf *mhi_buf = image_info->mhi_buf;
>>   @@ -455,18 +455,19 @@ static enum mhi_fw_load_type mhi_fw_load_type_get(const struct mhi_controller *m
>>     static int mhi_load_image_bhi(struct mhi_controller *mhi_cntrl, const u8 *fw_data, size_t size)
>>   {
>> -    struct image_info *image;
>> +    struct image_info *image = mhi_cntrl->bhi_image;
>>       int ret;
>>   -    ret = mhi_alloc_bhi_buffer(mhi_cntrl, &image, size);
>> -    if (ret)
>> -        return ret;
>> +    if (!image) {
>> +        ret = mhi_alloc_bhi_buffer(mhi_cntrl, &image, size);
>> +        if (ret)
>> +            return ret;
>>   -    /* Load the firmware into BHI vec table */
>> -    memcpy(image->mhi_buf->buf, fw_data, size);
>> +        /* Load the firmware into BHI vec table */
>> +        memcpy(image->mhi_buf->buf, fw_data, size);
>> +    }
>>         ret = mhi_fw_load_bhi(mhi_cntrl, &image->mhi_buf[image->entries - 1]);
> if mhi fw load fails didn't we need to free bhi buffer.
Good point. I'll fix in v2.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/3] bus: mhi: keep device context through suspend cycles
  2025-07-08 10:15   ` Krishna Chaitanya Chundru
@ 2025-07-10 15:07     ` Muhammad Usama Anjum
  2025-07-10 17:17       ` Krishna Chaitanya Chundru
  0 siblings, 1 reply; 27+ messages in thread
From: Muhammad Usama Anjum @ 2025-07-10 15:07 UTC (permalink / raw)
  To: Krishna Chaitanya Chundru, Manivannan Sadhasivam, Jeff Johnson,
	Jeff Hugo, Youssef Samir, Matthew Leung, Yan Zhen,
	Alexander Wilhelm, Alex Elder, Kunwu Chan, Greg Kroah-Hartman,
	Siddartha Mohanadoss, Sujeev Dias, Julia Lawall, John Crispin,
	Muna Sinada, Venkateswara Naralasetty, Maharaja Kennadyrajan, mhi,
	linux-arm-msm, linux-kernel, linux-wireless, ath11k
  Cc: kernel

On 7/8/25 3:15 PM, Krishna Chaitanya Chundru wrote:
> 
> 
> On 6/30/2025 1:13 PM, Muhammad Usama Anjum wrote:
>> Don't deinitialize the device context while going into suspend or
>> hibernation cycles. Otherwise the resume may fail if at resume time, the
>> memory pressure is high and no dma memory is available.
>>
>> Tested-on: WCN6855 WLAN.HSP.1.1-03926.13-QCAHSPSWPL_V2_SILICONZ_CE-2.52297.6
>>
>> Fixes: 3000f85b8f47 ("bus: mhi: core: Add support for basic PM operations")
>> Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
>> ---
>>   drivers/bus/mhi/host/init.c | 11 ++++++-----
>>   1 file changed, 6 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/bus/mhi/host/init.c b/drivers/bus/mhi/host/init.c
>> index 2e0f18c939e68..8f56e73fdc42e 100644
>> --- a/drivers/bus/mhi/host/init.c
>> +++ b/drivers/bus/mhi/host/init.c
>> @@ -1133,9 +1133,11 @@ int mhi_prepare_for_power_up(struct mhi_controller *mhi_cntrl)
>>         mutex_lock(&mhi_cntrl->pm_mutex);
>>   -    ret = mhi_init_dev_ctxt(mhi_cntrl);
> mhi init dev ctxt also initializes the ring pointers to base value,
> I think we should take care of them also ?
Are you referring to mhi_rings? They are getting initialized inside
mhi_init_dev_ctxt() and de-initialized in __mhi_deinit_dev_ctxt(). That's
why I've not handled them separately.

> 
> - Krishna Chaitanya.
>> -    if (ret)
>> -        goto error_dev_ctxt;
>> +    if (!mhi_cntrl->mhi_ctxt) {
>> +        ret = mhi_init_dev_ctxt(mhi_cntrl);
>> +        if (ret)
>> +            goto error_dev_ctxt;
>> +    }
>>         ret = mhi_read_reg(mhi_cntrl, mhi_cntrl->regs, BHIOFF, &bhi_off);
>>       if (ret) {
>> @@ -1212,8 +1214,6 @@ void mhi_deinit_dev_ctxt(struct mhi_controller *mhi_cntrl)
>>   {
>>       mhi_cntrl->bhi = NULL;
>>       mhi_cntrl->bhie = NULL;
>> -
>> -    __mhi_deinit_dev_ctxt(mhi_cntrl);
>>   }
>>     void mhi_unprepare_after_power_down(struct mhi_controller *mhi_cntrl)
>> @@ -1234,6 +1234,7 @@ void mhi_unprepare_after_power_down(struct mhi_controller *mhi_cntrl)
>>       }
>>         mhi_deinit_dev_ctxt(mhi_cntrl);
>> +    __mhi_deinit_dev_ctxt(mhi_cntrl);
>>   }
>>   EXPORT_SYMBOL_GPL(mhi_unprepare_after_power_down);
>>   


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/3] bus: mhi: keep device context through suspend cycles
  2025-07-10 15:07     ` Muhammad Usama Anjum
@ 2025-07-10 17:17       ` Krishna Chaitanya Chundru
  2025-07-11 10:22         ` Muhammad Usama Anjum
  0 siblings, 1 reply; 27+ messages in thread
From: Krishna Chaitanya Chundru @ 2025-07-10 17:17 UTC (permalink / raw)
  To: Muhammad Usama Anjum, Manivannan Sadhasivam, Jeff Johnson,
	Jeff Hugo, Youssef Samir, Matthew Leung, Yan Zhen,
	Alexander Wilhelm, Alex Elder, Kunwu Chan, Greg Kroah-Hartman,
	Siddartha Mohanadoss, Sujeev Dias, Julia Lawall, John Crispin,
	Muna Sinada, Venkateswara Naralasetty, Maharaja Kennadyrajan, mhi,
	linux-arm-msm, linux-kernel, linux-wireless, ath11k
  Cc: kernel



On 7/10/2025 8:37 PM, Muhammad Usama Anjum wrote:
> On 7/8/25 3:15 PM, Krishna Chaitanya Chundru wrote:
>>
>>
>> On 6/30/2025 1:13 PM, Muhammad Usama Anjum wrote:
>>> Don't deinitialize the device context while going into suspend or
>>> hibernation cycles. Otherwise the resume may fail if at resume time, the
>>> memory pressure is high and no dma memory is available.
>>>
>>> Tested-on: WCN6855 WLAN.HSP.1.1-03926.13-QCAHSPSWPL_V2_SILICONZ_CE-2.52297.6
>>>
>>> Fixes: 3000f85b8f47 ("bus: mhi: core: Add support for basic PM operations")
>>> Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
>>> ---
>>>    drivers/bus/mhi/host/init.c | 11 ++++++-----
>>>    1 file changed, 6 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/drivers/bus/mhi/host/init.c b/drivers/bus/mhi/host/init.c
>>> index 2e0f18c939e68..8f56e73fdc42e 100644
>>> --- a/drivers/bus/mhi/host/init.c
>>> +++ b/drivers/bus/mhi/host/init.c
>>> @@ -1133,9 +1133,11 @@ int mhi_prepare_for_power_up(struct mhi_controller *mhi_cntrl)
>>>          mutex_lock(&mhi_cntrl->pm_mutex);
>>>    -    ret = mhi_init_dev_ctxt(mhi_cntrl);
>> mhi init dev ctxt also initializes the ring pointers to base value,
>> I think we should take care of them also ?
> Are you referring to mhi_rings? They are getting initialized inside
> mhi_init_dev_ctxt() and de-initialized in __mhi_deinit_dev_ctxt(). That's
> why I've not handled them separately.
> 
Maybe I was not clear in my previous comment/not a correct place to do
the comment.

My point you are not freeing __mhi_deinit_dev_ctxt as part of suspend,
that means we are expecting device will continue to use the rp and wr 
pointers of ring as the previous i.e before suspend pointers.

What if PCIe keeps link in D3cold as part of system suspend, will the
device able to handle the previous rp & wp of ring. I don't think
device can handle this.

- Krishna Chaitanya.
>>
>> - Krishna Chaitanya.
>>> -    if (ret)
>>> -        goto error_dev_ctxt;
>>> +    if (!mhi_cntrl->mhi_ctxt) {
>>> +        ret = mhi_init_dev_ctxt(mhi_cntrl);
>>> +        if (ret)
>>> +            goto error_dev_ctxt;
>>> +    }
>>>          ret = mhi_read_reg(mhi_cntrl, mhi_cntrl->regs, BHIOFF, &bhi_off);
>>>        if (ret) {
>>> @@ -1212,8 +1214,6 @@ void mhi_deinit_dev_ctxt(struct mhi_controller *mhi_cntrl)
>>>    {
>>>        mhi_cntrl->bhi = NULL;
>>>        mhi_cntrl->bhie = NULL;
>>> -
>>> -    __mhi_deinit_dev_ctxt(mhi_cntrl);
>>>    }
>>>      void mhi_unprepare_after_power_down(struct mhi_controller *mhi_cntrl)
>>> @@ -1234,6 +1234,7 @@ void mhi_unprepare_after_power_down(struct mhi_controller *mhi_cntrl)
>>>        }
>>>          mhi_deinit_dev_ctxt(mhi_cntrl);
>>> +    __mhi_deinit_dev_ctxt(mhi_cntrl);
>>>    }
>>>    EXPORT_SYMBOL_GPL(mhi_unprepare_after_power_down);
>>>    
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/3] bus: mhi: keep device context through suspend cycles
  2025-07-10 17:17       ` Krishna Chaitanya Chundru
@ 2025-07-11 10:22         ` Muhammad Usama Anjum
  0 siblings, 0 replies; 27+ messages in thread
From: Muhammad Usama Anjum @ 2025-07-11 10:22 UTC (permalink / raw)
  To: Krishna Chaitanya Chundru, Manivannan Sadhasivam, Jeff Johnson,
	Jeff Hugo, Youssef Samir, Matthew Leung, Yan Zhen,
	Alexander Wilhelm, Alex Elder, Kunwu Chan, Greg Kroah-Hartman,
	Siddartha Mohanadoss, Sujeev Dias, Julia Lawall, John Crispin,
	Muna Sinada, Venkateswara Naralasetty, Maharaja Kennadyrajan, mhi,
	linux-arm-msm, linux-kernel, linux-wireless, ath11k
  Cc: kernel

...
>>>> diff --git a/drivers/bus/mhi/host/init.c b/drivers/bus/mhi/host/init.c
>>>> index 2e0f18c939e68..8f56e73fdc42e 100644
>>>> --- a/drivers/bus/mhi/host/init.c
>>>> +++ b/drivers/bus/mhi/host/init.c
>>>> @@ -1133,9 +1133,11 @@ int mhi_prepare_for_power_up(struct mhi_controller *mhi_cntrl)
>>>>          mutex_lock(&mhi_cntrl->pm_mutex);
>>>>    -    ret = mhi_init_dev_ctxt(mhi_cntrl);
>>> mhi init dev ctxt also initializes the ring pointers to base value,
>>> I think we should take care of them also ?
>> Are you referring to mhi_rings? They are getting initialized inside
>> mhi_init_dev_ctxt() and de-initialized in __mhi_deinit_dev_ctxt(). That's
>> why I've not handled them separately.
>>
> Maybe I was not clear in my previous comment/not a correct place to do
> the comment.
> 
> My point you are not freeing __mhi_deinit_dev_ctxt as part of suspend,
> that means we are expecting device will continue to use the rp and wr pointers of ring as the previous i.e before suspend pointers.
> 
> What if PCIe keeps link in D3cold as part of system suspend, will the
> device able to handle the previous rp & wp of ring. I don't think
> device can handle this.
I don't have much internals logic of the driver. I've checked on my device and
the read/write pointers have 2 entries for mhi_event and 1 entry for mhi_cmd. But still
without resetting these, I've not got any problem. Any idea why?

I'll reset rings' read/write pointers in v2.



^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2025-07-11 10:23 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-30  7:43 [PATCH 0/3] bus: mhi: keep dma buffers through suspend/hibernation cycles Muhammad Usama Anjum
2025-06-30  7:43 ` [PATCH 1/3] bus: mhi: host: keep bhi buffer through suspend cycle Muhammad Usama Anjum
2025-07-01 10:25   ` Greg Kroah-Hartman
2025-07-02 15:24     ` Muhammad Usama Anjum
2025-07-08  9:47   ` Krishna Chaitanya Chundru
2025-07-10 14:14     ` Muhammad Usama Anjum
2025-06-30  7:43 ` [PATCH 2/3] bus: mhi: don't deinitialize and re-initialize again Muhammad Usama Anjum
2025-07-01 10:25   ` Greg Kroah-Hartman
2025-07-02 15:25     ` Muhammad Usama Anjum
2025-07-01 14:49   ` Jeff Johnson
2025-07-02 15:28     ` Muhammad Usama Anjum
2025-07-02 17:25       ` Jeff Johnson
2025-07-02  3:50   ` Baochen Qiang
2025-07-02 16:12     ` Muhammad Usama Anjum
2025-07-03  1:59       ` Baochen Qiang
2025-07-07  8:19         ` Muhammad Usama Anjum
2025-07-07  9:00           ` Baochen Qiang
2025-07-07 13:11             ` Muhammad Usama Anjum
2025-07-08  1:43               ` Baochen Qiang
2025-07-08  9:12                 ` Muhammad Usama Anjum
2025-07-08 10:38                   ` Baochen Qiang
2025-06-30  7:43 ` [PATCH 3/3] bus: mhi: keep device context through suspend cycles Muhammad Usama Anjum
2025-07-01 10:26   ` Greg Kroah-Hartman
2025-07-08 10:15   ` Krishna Chaitanya Chundru
2025-07-10 15:07     ` Muhammad Usama Anjum
2025-07-10 17:17       ` Krishna Chaitanya Chundru
2025-07-11 10:22         ` Muhammad Usama Anjum

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).