Re: [PATCH ath-current 0/7] wifi: ath12k: Fix Issues in REO RX Queue Updates

linux-wireless.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: "Nicolas Escande" <nico.escande@gmail.com>
To: "Nithyanantham Paramasivam" <nithyanantham.paramasivam@oss.qualcomm.com>
Cc: <ath12k@lists.infradead.org>, <linux-wireless@vger.kernel.org>
Subject: Re: [PATCH ath-current 0/7] wifi: ath12k: Fix Issues in REO RX Queue Updates
Date: Thu, 07 Aug 2025 15:39:21 +0200	[thread overview]
Message-ID: <DBW8K6K30OQS.U5Z6UMSMEY7B@gmail.com> (raw)
In-Reply-To: <CAD1Z1J+jXWng4ma9_BPJh4N8b7AhTvRKtJ=tg1dTyk8E3wqupA@mail.gmail.com>

On Thu Aug 7, 2025 at 2:31 PM CEST, Nithyanantham Paramasivam wrote:
> On Thu, Aug 7, 2025 at 1:32 AM Nicolas Escande <nico.escande@gmail.com> wrote:
>>
>> On Wed Aug 6, 2025 at 1:17 PM CEST, Nithyanantham Paramasivam wrote:
>> > During stress test scenarios, when the REO command ring becomes full,
>> > the RX queue update command issued during peer deletion fails due to
>> > insufficient space. In response, the host performs a dma_unmap and
>> > frees the associated memory. However, the hardware still retains a
>> > reference to the same memory address. If the kernel later reallocates
>> > this address, unaware that the hardware is still using it, it can
>> > lead to memory corruption-since the host might access or modify
>> > memory that is still actively referenced by the hardware.
>> >
>> > Implement a retry mechanism for the HAL_REO_CMD_UPDATE_RX_QUEUE
>> > command during TID deletion to prevent memory corruption. Introduce
>> > a new list, reo_cmd_update_rx_queue_list, in the dp structure to
>> > track pending RX queue updates. Protect this list with
>> > reo_rxq_flush_lock, which also ensures synchronized access to
>> > reo_cmd_cache_flush_list. Defer memory release until hardware
>> > confirms the virtual address is no longer in use, avoiding immediate
>> > deallocation on command failure. Release memory for pending RX queue
>> > updates via ath12k_dp_rx_reo_cmd_list_cleanup() on system reset
>> > if hardware confirmation is not received.
>> >
>> > Additionally, increase DP_REO_CMD_RING_SIZE to 256 to reduce the
>> > likelihood of ring exhaustion. Use a 1KB cache flush command for
>> > QoS TID descriptors to improve efficiency.
>> >
>>
>> Hello,
>>
>> I'm not sure I fully understand so please correct where I'm wrong but from what
>> I got looking at the code:
>>   - you increase the ring size for reo cmd
>>   - you enable releasing multiple tid buffer at once
>>   - as soon as you allocate the tid you create an entry in the list flagging it
>>     as active
>>   - when you need to clean up a tid
>>     - you mark it as inactive in the list, then
>>     - try to process the whole list by
>>       - sending the reo command to release it
>>       - if it succeed you release it and remove the entry from the list
>>   - on core exit, you re process the list
>>
>> What is bothering me with this is that when a vdev which has multiple sta goes
>> down, you will have a lot of those entries to push to the firmware at once:
>>
>>   - So increasing the ring size / being able to release multiple entries at once
>>     in one reo cmd may or may not be enough to handle the burst. It seems
>>     that increasing those is just band aids on the underlying problem but I
>>     understand it's just to be on the safe side.
>>   - If entries do not get send/accepted by the firmware, we will need to wait
>>     for another station removal before retrying, which could be a wile.
>>   - Or in the worst case (one vdev going down and needing to release tid of
>>     all its stations) the more entries we have in the list the less likely we
>>     will be to be able to push the delete of all stations + the ones still in
>>     queue
>>
>> So, on station removal, why not make just things completely async. Push the tid
>> deletes in a list and wake a workqueue that tries to push those to the firmware.
>> If the ring is full, retry periodically.
>>
>> Thanks
>
> Hi Nicolas,
Hi
>
> Thanks for the detailed observations and suggestions.
>
> You're right in your understanding of the flow. To clarify further:
>
> When the host fails to obtain a buffer from the hardware to send a
> command—due to the REO command ring being full
> (ath12k_hal_reo_cmd_send returning -ENOBUFS)—we treat it as a command
> send failure and avoid deleting the corresponding entry immediately.
>
> This situation typically arises in the flow:
>
> ath12k_dp_rx_process_reo_cmd_update_rx_queue_list →
> ath12k_dp_rx_tid_delete_handler → returns -ENOBUFS
>
> Given the command ring size is 256, and each peer generally has 16
> TIDs, each TID sends 2 commands (RXQ update and cache flush). So,
> deleting one peer involves up to 32 commands. This means we can delete
> up to 8 peers (8 × 32 = 256 commands) before hitting the ring limit.
>
> From the 9th peer onward, we may encounter ring exhaustion. However,
> we handle retries in the REO command callback
> (ath12k_dp_rx_tid_del_func). If commands fail to send, they remain in
> the pending list and can be retried via the success callback of
> earlier commands.
That was the part I did not get, I thought it was just another peer removal that
would cause the whole list of pending entries to be reprocessed and pushed to
the ring.
>
> Do we foresee any issue with this ?
>
Nope, it should work.
> Regarding your suggestion to make the deletion process fully
> asynchronous via a workqueue, that’s a valid point. Workqueue-based
> handling is a good idea, but we need to account for potential delays
> if the worker thread isn’t scheduled promptly. We also need to
> consider timeout-based rescheduling, especially during command
> failures, to ensure retries happen in a timely manner. We’ll evaluate
> this suggestion further and get back to you.
>
IMHO it feels like it would simplify the code to just push + wake to a wq on
delete, and reschedule if no space is available in the ring. The timing should
not be such a big deal right ? The important part is to eventually push all reo
commands to the firmware but it should not have impact on the rest of the
operation if it reaches the firmware later that sooner right ?

> Thanks again for the insightful feedback!
Thanks for all the explanations
>
> Best Regards,
> Nithy

     prev parent reply	other threads:[~2025-08-07 13:39 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-06 11:17 [PATCH ath-current 0/7] wifi: ath12k: Fix Issues in REO RX Queue Updates Nithyanantham Paramasivam
2025-08-06 11:17 ` [PATCH ath-current 1/7] wifi: ath12k: Increase DP_REO_CMD_RING_SIZE to 256 Nithyanantham Paramasivam
2025-08-06 11:17 ` [PATCH ath-current 2/7] wifi: ath12k: Refactor RX TID deletion handling into helper function Nithyanantham Paramasivam
2025-08-06 11:17 ` [PATCH ath-current 3/7] wifi: ath12k: Refactor RX TID buffer cleanup " Nithyanantham Paramasivam
2025-08-06 11:17 ` [PATCH ath-current 4/7] wifi: ath12k: Refactor REO command to use ath12k_dp_rx_tid_rxq Nithyanantham Paramasivam
2025-08-06 11:17 ` [PATCH ath-current 5/7] wifi: ath12k: Add Retry Mechanism for REO RX Queue Update Failures Nithyanantham Paramasivam
2025-08-06 11:17 ` [PATCH ath-current 6/7] wifi: ath12k: Fix flush cache failure during RX queue update Nithyanantham Paramasivam
2025-08-06 11:17 ` [PATCH ath-current 7/7] wifi: ath12k: Use 1KB Cache Flush Command for QoS TID Descriptors Nithyanantham Paramasivam
2025-08-06 20:02 ` [PATCH ath-current 0/7] wifi: ath12k: Fix Issues in REO RX Queue Updates Nicolas Escande
2025-08-07 12:31   ` Nithyanantham Paramasivam
2025-08-07 13:39     ` Nicolas Escande [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=DBW8K6K30OQS.U5Z6UMSMEY7B@gmail.com \
    --to=nico.escande@gmail.com \
    --cc=ath12k@lists.infradead.org \
    --cc=linux-wireless@vger.kernel.org \
    --cc=nithyanantham.paramasivam@oss.qualcomm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).