From: Brian Norris <briannorris@chromium.org>
To: Kalle Valo <kvalo@codeaurora.org>
Cc: Carl Huang <cjhuang@codeaurora.org>,
linux-wireless@vger.kernel.org, ath10k@lists.infradead.org,
Wen Gong <wgong@codeaurora.org>
Subject: Re: [PATCH] ath10k: pci: use mutex for diagnostic window CE polling
Date: Mon, 25 Mar 2019 13:27:07 -0700 [thread overview]
Message-ID: <20190325202706.GA68720@google.com> (raw)
In-Reply-To: <20190207014143.41529-1-briannorris@chromium.org>
Hi Kalle,
On Wed, Feb 06, 2019 at 05:41:43PM -0800, Brian Norris wrote:
> The DIAG copy engine is only used via polling, but it holds a spinlock
> with softirqs disabled. Each iteration of our read/write loops can
> theoretically take 20ms (two 10ms timeout loops), and this loop can be
> run an unbounded number of times while holding the spinlock -- dependent
> on the request size given by the caller.
>
> As of commit 39501ea64116 ("ath10k: download firmware via diag Copy
> Engine for QCA6174 and QCA9377."), we transfer large chunks of firmware
> memory using this mechanism. With large enough firmware segments, this
> becomes an exceedingly long period for disabling soft IRQs. For example,
> with a 500KiB firmware segment, in testing QCA6174A, I see 200 loop
> iterations of about 50-100us each, which can total about 10-20ms.
>
> In reality, we don't really need to block softirqs for this duration.
> The DIAG CE is only used in polling mode, and we only need to hold
> ce_lock to make sure any CE bookkeeping is done without screwing up
> another CE. Otherwise, we only need to ensure exclusion between
> ath10k_pci_diag_{read,write}_mem() contexts.
>
> This patch moves to use fine-grained locking for the shared ce_lock,
> while adding a new mutex just to ensure mutual exclusion of diag
> read/write operations.
>
> Tested on QCA6174A, firmware version WLAN.RM.4.4.1-00132-QCARMSWPZ-1.
>
> Fixes: 39501ea64116 ("ath10k: download firmware via diag Copy Engine for QCA6174 and QCA9377.")
> Signed-off-by: Brian Norris <briannorris@chromium.org>
It would appear that this triggers new warnings
BUG: sleeping function called from invalid context
when handling firmware crashes. The call stack is
ath10k_pci_fw_crashed_dump
-> ath10k_pci_dump_memory
...
-> ath10k_pci_diag_read_mem
and the problem is that we're holding the 'data_lock' spinlock with
softirqs disabled, while later trying to grab this new mutex.
Unfortunately, data_lock is used in a lot of places, and it's unclear if
it can be migrated to a mutex as well. It seems like it probably can be,
but I'd have to audit a little more closely.
Any thoughts on what the short- and long-term solutions should be? I can
send a revert, to get v5.1 fixed. But it still seems like we should
avoid disabling softirqs for so long.
Brian
_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k
WARNING: multiple messages have this Message-ID (diff)
From: Brian Norris <briannorris@chromium.org>
To: Kalle Valo <kvalo@codeaurora.org>
Cc: ath10k@lists.infradead.org, Carl Huang <cjhuang@codeaurora.org>,
Wen Gong <wgong@codeaurora.org>,
linux-wireless@vger.kernel.org
Subject: Re: [PATCH] ath10k: pci: use mutex for diagnostic window CE polling
Date: Mon, 25 Mar 2019 13:27:07 -0700 [thread overview]
Message-ID: <20190325202706.GA68720@google.com> (raw)
In-Reply-To: <20190207014143.41529-1-briannorris@chromium.org>
Hi Kalle,
On Wed, Feb 06, 2019 at 05:41:43PM -0800, Brian Norris wrote:
> The DIAG copy engine is only used via polling, but it holds a spinlock
> with softirqs disabled. Each iteration of our read/write loops can
> theoretically take 20ms (two 10ms timeout loops), and this loop can be
> run an unbounded number of times while holding the spinlock -- dependent
> on the request size given by the caller.
>
> As of commit 39501ea64116 ("ath10k: download firmware via diag Copy
> Engine for QCA6174 and QCA9377."), we transfer large chunks of firmware
> memory using this mechanism. With large enough firmware segments, this
> becomes an exceedingly long period for disabling soft IRQs. For example,
> with a 500KiB firmware segment, in testing QCA6174A, I see 200 loop
> iterations of about 50-100us each, which can total about 10-20ms.
>
> In reality, we don't really need to block softirqs for this duration.
> The DIAG CE is only used in polling mode, and we only need to hold
> ce_lock to make sure any CE bookkeeping is done without screwing up
> another CE. Otherwise, we only need to ensure exclusion between
> ath10k_pci_diag_{read,write}_mem() contexts.
>
> This patch moves to use fine-grained locking for the shared ce_lock,
> while adding a new mutex just to ensure mutual exclusion of diag
> read/write operations.
>
> Tested on QCA6174A, firmware version WLAN.RM.4.4.1-00132-QCARMSWPZ-1.
>
> Fixes: 39501ea64116 ("ath10k: download firmware via diag Copy Engine for QCA6174 and QCA9377.")
> Signed-off-by: Brian Norris <briannorris@chromium.org>
It would appear that this triggers new warnings
BUG: sleeping function called from invalid context
when handling firmware crashes. The call stack is
ath10k_pci_fw_crashed_dump
-> ath10k_pci_dump_memory
...
-> ath10k_pci_diag_read_mem
and the problem is that we're holding the 'data_lock' spinlock with
softirqs disabled, while later trying to grab this new mutex.
Unfortunately, data_lock is used in a lot of places, and it's unclear if
it can be migrated to a mutex as well. It seems like it probably can be,
but I'd have to audit a little more closely.
Any thoughts on what the short- and long-term solutions should be? I can
send a revert, to get v5.1 fixed. But it still seems like we should
avoid disabling softirqs for so long.
Brian
next prev parent reply other threads:[~2019-03-25 20:27 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-02-07 1:41 [PATCH] ath10k: pci: use mutex for diagnostic window CE polling Brian Norris
2019-02-07 1:41 ` Brian Norris
2019-02-11 16:32 ` Kalle Valo
2019-02-11 16:32 ` Kalle Valo
2019-03-25 20:27 ` Brian Norris [this message]
2019-03-25 20:27 ` Brian Norris
2019-03-25 21:20 ` Michał Kazior
2019-03-25 21:20 ` Michał Kazior
2019-03-25 22:14 ` Brian Norris
2019-03-25 22:14 ` Brian Norris
2019-03-26 20:35 ` Brian Norris
2019-03-26 20:35 ` Brian Norris
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190325202706.GA68720@google.com \
--to=briannorris@chromium.org \
--cc=ath10k@lists.infradead.org \
--cc=cjhuang@codeaurora.org \
--cc=kvalo@codeaurora.org \
--cc=linux-wireless@vger.kernel.org \
--cc=wgong@codeaurora.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.