linux-wireless.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Fedor Pchelkin <pchelkin@ispras.ru>
To: Hillf Danton <hdanton@sina.com>
Cc: "Toke Høiland-Jørgensen" <toke@toke.dk>,
	"Kalle Vallo" <kvalo@kernel.org>,
	syzbot+f2cb6e0ffdb961921e4d@syzkaller.appspotmail.com,
	linux-wireless@vger.kernel.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	"Alexey Khoroshilov" <khoroshilov@ispras.ru>,
	lvc-project@linuxtesting.org
Subject: Re: [PATCH v3 1/2] wifi: ath9k: fix races between ath9k_wmi_cmd and ath9k_wmi_ctrl_rx
Date: Thu, 18 May 2023 18:44:24 +0300	[thread overview]
Message-ID: <20230518154424.62urbguy4rxetkty@fpc> (raw)
In-Reply-To: <20230518102437.4443-1-hdanton@sina.com>

On Thu, May 18, 2023 at 06:24:37PM +0800, Hillf Danton wrote:
> Fedor Pchelkin <pchelkin@ispras.ru> writes:
> 
> > On Wed, Apr 26, 2023 at 07:07:08AM +0800, Hillf Danton wrote: 
> >> Given similar wait timeout[1], just taking lock on the waiter side is not
> >> enough wrt fixing the race, because in case job done on the waker side,
> >> waiter needs to wait again after timeout.
> >> 
> >
> > As I understand you correctly, you mean the case when a timeout occurs
> > during ath9k_wmi_ctrl_rx() callback execution. I suppose if a timeout has
> > occurred on a waiter's side, it should return immediately and doesn't have
> > to care in which state the callback has been at that moment.
> >
> > AFAICS, this is controlled properly with taking a wmi_lock on waiter and
> > waker sides, and there is no data corruption.
> >
> > If a callback has not managed to do its work entirely (performing a
> > completion and subsequently waking waiting thread is included here), then,
> > well, it is considered a timeout, in my opinion.
> >
> > Your suggestion makes a wmi_cmd call to give a little more chance for the
> > belated callback to complete (although timeout has actually expired). That
> > is probably good, but increasing a timeout value makes that job, too. I
> > don't think it makes any sense on real hardware.
> >
> > Or do you mean there is data corruption that is properly fixed with your patch?
> 
> Given complete() not paired with wait_for_completion(), what is the
> difference after this patch?

The main thing in the patch is making ath9k_wmi_ctrl_rx() release wmi_lock
after calling ath9k_wmi_rsp_callback() which does copying data into the
shared wmi->cmd_rsp_buf buffer. Otherwise there can occur a data
corrupting scenario outlined in the patch description (added it here,
too).

On Tue, 25 Apr 2023 22:26:06 +0300, Fedor Pchelkin wrote:
> CPU0					CPU1
> 
> ath9k_wmi_cmd(...)
>   mutex_lock(&wmi->op_mutex)
>   ath9k_wmi_cmd_issue(...)
>   wait_for_completion_timeout(...)
>   ---
>   timeout
>   ---
> 					/* the callback is being processed
> 					 * before last_seq_id became zero
> 					 */
> 					ath9k_wmi_ctrl_rx(...)
> 					  spin_lock_irqsave(...)
> 					  /* wmi->last_seq_id check here
> 					   * doesn't detect timeout yet
> 					   */
> 					  spin_unlock_irqrestore(...)
>   /* last_seq_id is zeroed to
>    * indicate there was a timeout
>    */
>   wmi->last_seq_id = 0
>   mutex_unlock(&wmi->op_mutex)
>   return -ETIMEDOUT
> 
> ath9k_wmi_cmd(...)
>   mutex_lock(&wmi->op_mutex)
>   /* the buffer is replaced with
>    * another one
>    */
>   wmi->cmd_rsp_buf = rsp_buf
>   wmi->cmd_rsp_len = rsp_len
>   ath9k_wmi_cmd_issue(...)
>     spin_lock_irqsave(...)
>     spin_unlock_irqrestore(...)
>   wait_for_completion_timeout(...)
> 					/* the continuation of the
> 					 * callback left after the first
> 					 * ath9k_wmi_cmd call
> 					 */
> 					  ath9k_wmi_rsp_callback(...)
> 					    /* copying data designated
> 					     * to already timeouted
> 					     * WMI command into an
> 					     * inappropriate wmi_cmd_buf
> 					     */
> 					    memcpy(...)
> 					    complete(&wmi->cmd_wait)
>   /* awakened by the bogus callback
>    * => invalid return result
>    */
>   mutex_unlock(&wmi->op_mutex)
>   return 0

So before the patch the wmi->last_seq_id check in ath9k_wmi_ctrl_rx()
wasn't helpful in case wmi->last_seq_id value was changed during
ath9k_wmi_rsp_callback() execution because of the next ath9k_wmi_cmd()
call.

With the proposed patch the wmi->last_seq_id check in ath9k_wmi_ctrl_rx()
accomplishes its job as:
 - the next ath9k_wmi_cmd call changes last_seq_id value under lock so
   it either waits for a belated ath9k_wmi_ctrl_rx() to finish or updates
   last_seq_id value so that the timeout check in ath9k_wmi_ctrl_rx()
   indicates that the waiter side has timeouted and we shouldn't further
   process the callback.
 - memcopying in ath9k_wmi_rsp_callback() is made to a valid place if
   the last_seq_id check was successful under the lock.

  parent reply	other threads:[~2023-05-18 15:44 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-15 20:21 [PATCH 0/3] wifi: ath9k: deal with uninit memory Fedor Pchelkin
2023-03-15 20:21 ` [PATCH 1/3] wifi: ath9k: avoid referencing uninit memory in ath9k_wmi_ctrl_rx Fedor Pchelkin
2023-03-17  5:26   ` Kalle Valo
2023-03-18 20:25     ` Fedor Pchelkin
2023-04-24 18:23   ` Fedor Pchelkin
2023-04-24 18:33     ` [PATCH v2] " Fedor Pchelkin
2023-04-25 11:14       ` Toke Høiland-Jørgensen
2023-04-28 16:52       ` Kalle Valo
2023-03-15 20:21 ` [PATCH 2/3] wifi: ath9k: fix races between ath9k_wmi_cmd and ath9k_wmi_ctrl_rx Fedor Pchelkin
2023-04-24 19:11   ` Fedor Pchelkin
2023-04-24 19:18     ` [PATCH v2] " Fedor Pchelkin
     [not found]       ` <20230425033832.2041-1-hdanton@sina.com>
2023-04-25  5:45         ` Kalle Valo
2023-04-25  7:54         ` Fedor Pchelkin
2023-04-25 19:26         ` [PATCH v3 1/2] " Fedor Pchelkin
2023-04-25 19:26           ` [PATCH v3 2/2] wifi: ath9k: protect WMI command response buffer replacement with a lock Fedor Pchelkin
2023-08-08 14:07             ` Toke Høiland-Jørgensen
     [not found]           ` <20230425230708.2132-1-hdanton@sina.com>
2023-04-26 19:02             ` [PATCH v3 1/2] wifi: ath9k: fix races between ath9k_wmi_cmd and ath9k_wmi_ctrl_rx Fedor Pchelkin
2023-05-15 12:06               ` Toke Høiland-Jørgensen
     [not found]               ` <20230518102437.4443-1-hdanton@sina.com>
2023-05-18 15:44                 ` Fedor Pchelkin [this message]
2023-08-08 14:06           ` Toke Høiland-Jørgensen
2023-08-22 13:35           ` Kalle Valo
2023-03-15 20:21 ` [PATCH 3/3] wifi: ath9k: fix ath9k_wmi_cmd return value when device is unplugged Fedor Pchelkin
2023-03-15 20:47 ` [PATCH 0/3] wifi: ath9k: deal with uninit memory Fedor Pchelkin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230518154424.62urbguy4rxetkty@fpc \
    --to=pchelkin@ispras.ru \
    --cc=hdanton@sina.com \
    --cc=khoroshilov@ispras.ru \
    --cc=kvalo@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-wireless@vger.kernel.org \
    --cc=lvc-project@linuxtesting.org \
    --cc=netdev@vger.kernel.org \
    --cc=syzbot+f2cb6e0ffdb961921e4d@syzkaller.appspotmail.com \
    --cc=toke@toke.dk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).