From: Fedor Pchelkin <pchelkin@ispras.ru>
To: Hillf Danton <hdanton@sina.com>
Cc: "Toke Høiland-Jørgensen" <toke@toke.dk>,
"Kalle Vallo" <kvalo@kernel.org>,
syzbot+f2cb6e0ffdb961921e4d@syzkaller.appspotmail.com,
linux-wireless@vger.kernel.org, netdev@vger.kernel.org,
linux-kernel@vger.kernel.org,
"Alexey Khoroshilov" <khoroshilov@ispras.ru>,
lvc-project@linuxtesting.org
Subject: Re: [PATCH v3 1/2] wifi: ath9k: fix races between ath9k_wmi_cmd and ath9k_wmi_ctrl_rx
Date: Thu, 18 May 2023 18:44:24 +0300 [thread overview]
Message-ID: <20230518154424.62urbguy4rxetkty@fpc> (raw)
In-Reply-To: <20230518102437.4443-1-hdanton@sina.com>
On Thu, May 18, 2023 at 06:24:37PM +0800, Hillf Danton wrote:
> Fedor Pchelkin <pchelkin@ispras.ru> writes:
>
> > On Wed, Apr 26, 2023 at 07:07:08AM +0800, Hillf Danton wrote:
> >> Given similar wait timeout[1], just taking lock on the waiter side is not
> >> enough wrt fixing the race, because in case job done on the waker side,
> >> waiter needs to wait again after timeout.
> >>
> >
> > As I understand you correctly, you mean the case when a timeout occurs
> > during ath9k_wmi_ctrl_rx() callback execution. I suppose if a timeout has
> > occurred on a waiter's side, it should return immediately and doesn't have
> > to care in which state the callback has been at that moment.
> >
> > AFAICS, this is controlled properly with taking a wmi_lock on waiter and
> > waker sides, and there is no data corruption.
> >
> > If a callback has not managed to do its work entirely (performing a
> > completion and subsequently waking waiting thread is included here), then,
> > well, it is considered a timeout, in my opinion.
> >
> > Your suggestion makes a wmi_cmd call to give a little more chance for the
> > belated callback to complete (although timeout has actually expired). That
> > is probably good, but increasing a timeout value makes that job, too. I
> > don't think it makes any sense on real hardware.
> >
> > Or do you mean there is data corruption that is properly fixed with your patch?
>
> Given complete() not paired with wait_for_completion(), what is the
> difference after this patch?
The main thing in the patch is making ath9k_wmi_ctrl_rx() release wmi_lock
after calling ath9k_wmi_rsp_callback() which does copying data into the
shared wmi->cmd_rsp_buf buffer. Otherwise there can occur a data
corrupting scenario outlined in the patch description (added it here,
too).
On Tue, 25 Apr 2023 22:26:06 +0300, Fedor Pchelkin wrote:
> CPU0 CPU1
>
> ath9k_wmi_cmd(...)
> mutex_lock(&wmi->op_mutex)
> ath9k_wmi_cmd_issue(...)
> wait_for_completion_timeout(...)
> ---
> timeout
> ---
> /* the callback is being processed
> * before last_seq_id became zero
> */
> ath9k_wmi_ctrl_rx(...)
> spin_lock_irqsave(...)
> /* wmi->last_seq_id check here
> * doesn't detect timeout yet
> */
> spin_unlock_irqrestore(...)
> /* last_seq_id is zeroed to
> * indicate there was a timeout
> */
> wmi->last_seq_id = 0
> mutex_unlock(&wmi->op_mutex)
> return -ETIMEDOUT
>
> ath9k_wmi_cmd(...)
> mutex_lock(&wmi->op_mutex)
> /* the buffer is replaced with
> * another one
> */
> wmi->cmd_rsp_buf = rsp_buf
> wmi->cmd_rsp_len = rsp_len
> ath9k_wmi_cmd_issue(...)
> spin_lock_irqsave(...)
> spin_unlock_irqrestore(...)
> wait_for_completion_timeout(...)
> /* the continuation of the
> * callback left after the first
> * ath9k_wmi_cmd call
> */
> ath9k_wmi_rsp_callback(...)
> /* copying data designated
> * to already timeouted
> * WMI command into an
> * inappropriate wmi_cmd_buf
> */
> memcpy(...)
> complete(&wmi->cmd_wait)
> /* awakened by the bogus callback
> * => invalid return result
> */
> mutex_unlock(&wmi->op_mutex)
> return 0
So before the patch the wmi->last_seq_id check in ath9k_wmi_ctrl_rx()
wasn't helpful in case wmi->last_seq_id value was changed during
ath9k_wmi_rsp_callback() execution because of the next ath9k_wmi_cmd()
call.
With the proposed patch the wmi->last_seq_id check in ath9k_wmi_ctrl_rx()
accomplishes its job as:
- the next ath9k_wmi_cmd call changes last_seq_id value under lock so
it either waits for a belated ath9k_wmi_ctrl_rx() to finish or updates
last_seq_id value so that the timeout check in ath9k_wmi_ctrl_rx()
indicates that the waiter side has timeouted and we shouldn't further
process the callback.
- memcopying in ath9k_wmi_rsp_callback() is made to a valid place if
the last_seq_id check was successful under the lock.
next prev parent reply other threads:[~2023-05-18 15:44 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-03-15 20:21 [PATCH 0/3] wifi: ath9k: deal with uninit memory Fedor Pchelkin
2023-03-15 20:21 ` [PATCH 1/3] wifi: ath9k: avoid referencing uninit memory in ath9k_wmi_ctrl_rx Fedor Pchelkin
2023-03-17 5:26 ` Kalle Valo
2023-03-18 20:25 ` Fedor Pchelkin
2023-04-24 18:23 ` Fedor Pchelkin
2023-04-24 18:33 ` [PATCH v2] " Fedor Pchelkin
2023-04-25 11:14 ` Toke Høiland-Jørgensen
2023-04-28 16:52 ` Kalle Valo
2023-03-15 20:21 ` [PATCH 2/3] wifi: ath9k: fix races between ath9k_wmi_cmd and ath9k_wmi_ctrl_rx Fedor Pchelkin
2023-04-24 19:11 ` Fedor Pchelkin
2023-04-24 19:18 ` [PATCH v2] " Fedor Pchelkin
[not found] ` <20230425033832.2041-1-hdanton@sina.com>
2023-04-25 5:45 ` Kalle Valo
2023-04-25 7:54 ` Fedor Pchelkin
2023-04-25 19:26 ` [PATCH v3 1/2] " Fedor Pchelkin
2023-04-25 19:26 ` [PATCH v3 2/2] wifi: ath9k: protect WMI command response buffer replacement with a lock Fedor Pchelkin
2023-08-08 14:07 ` Toke Høiland-Jørgensen
[not found] ` <20230425230708.2132-1-hdanton@sina.com>
2023-04-26 19:02 ` [PATCH v3 1/2] wifi: ath9k: fix races between ath9k_wmi_cmd and ath9k_wmi_ctrl_rx Fedor Pchelkin
2023-05-15 12:06 ` Toke Høiland-Jørgensen
2023-05-18 10:24 ` Hillf Danton
2023-05-18 15:44 ` Fedor Pchelkin [this message]
2023-08-08 14:06 ` Toke Høiland-Jørgensen
2023-08-22 13:35 ` Kalle Valo
2023-03-15 20:21 ` [PATCH 3/3] wifi: ath9k: fix ath9k_wmi_cmd return value when device is unplugged Fedor Pchelkin
2023-03-15 20:47 ` [PATCH 0/3] wifi: ath9k: deal with uninit memory Fedor Pchelkin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230518154424.62urbguy4rxetkty@fpc \
--to=pchelkin@ispras.ru \
--cc=hdanton@sina.com \
--cc=khoroshilov@ispras.ru \
--cc=kvalo@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-wireless@vger.kernel.org \
--cc=lvc-project@linuxtesting.org \
--cc=netdev@vger.kernel.org \
--cc=syzbot+f2cb6e0ffdb961921e4d@syzkaller.appspotmail.com \
--cc=toke@toke.dk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.