[PATCH v2] nvme-tcp: fix a possible UAF when failing to allocate an io queue

linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2] nvme-tcp: fix a possible UAF when failing to allocate an io queue
@ 2023-03-19 13:14 Sagi Grimberg
  2023-03-20 12:51 ` Hannes Reinecke
  0 siblings, 1 reply; 3+ messages in thread
From: Sagi Grimberg @ 2023-03-19 13:14 UTC (permalink / raw)
  To: linux-nvme
  Cc: Christoph Hellwig, Keith Busch, Chaitanya Kulkarni, Yanjun Zhang

When we allocate a nvme-tcp queue, we set the data_ready callback before
we actually need to use it. This creates the potential that if a stray
controller sends us data on the socket before we connect, we can trigger
the io_work and start consuming the socket.

In this case reported: we failed to allocate one of the io queues, and
as we start releasing the queues that we already allocated, we get
a UAF [1] from the io_work which is running before it should really.

Fix this by setting the socket ops callbacks only before we start the
queue, so that we can't accidently schedule the io_work in the initialization
phase before the queue started.

[1]:
[16802.107284] nvme nvme4: starting error recovery
[16802.109166] nvme nvme4: Reconnecting in 10 seconds...
[16812.173535] nvme nvme4: failed to connect socket: -111
[16812.173745] nvme nvme4: Failed reconnect attempt 1
[16812.173747] nvme nvme4: Reconnecting in 10 seconds...
[16822.413555] nvme nvme4: failed to connect socket: -111
[16822.413762] nvme nvme4: Failed reconnect attempt 2
[16822.413765] nvme nvme4: Reconnecting in 10 seconds...
[16832.661274] nvme nvme4: creating 32 I/O queues.
[16833.919887] BUG: kernel NULL pointer dereference, address: 0000000000000088
[16833.920068] nvme nvme4: Failed reconnect attempt 3
[16833.920094] #PF: supervisor write access in kernel mode
[16833.920261] nvme nvme4: Reconnecting in 10 seconds...
[16833.920368] #PF: error_code(0x0002) - not-present page
[16833.921086] Workqueue: nvme_tcp_wq nvme_tcp_io_work [nvme_tcp]
[16833.921191] RIP: 0010:_raw_spin_lock_bh+0x17/0x30
...
[16833.923138] Call Trace:
[16833.923271]  <TASK>
[16833.923402]  lock_sock_nested+0x1e/0x50
[16833.923545]  nvme_tcp_try_recv+0x40/0xa0 [nvme_tcp]
[16833.923685]  nvme_tcp_io_work+0x68/0xa0 [nvme_tcp]
[16833.923824]  process_one_work+0x1e8/0x390
[16833.923969]  worker_thread+0x53/0x3d0
[16833.924104]  ? process_one_work+0x390/0x390
[16833.924240]  kthread+0x124/0x150
[16833.924376]  ? set_kthread_struct+0x50/0x50
[16833.924518]  ret_from_fork+0x1f/0x30
[16833.924655]  </TASK>

Reported-by: Yanjun Zhang <zhangyanjun@cestc.cn>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
---
Changes from v1:
- Fix silly compliation error

Yanjun, I'll be waiting for your Tested-by tag. Need to
in order to apply this fix.

 drivers/nvme/host/tcp.c | 37 +++++++++++++++++++------------------
 1 file changed, 19 insertions(+), 18 deletions(-)

diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 42c0598c31f2..4ef614bf201c 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -1620,22 +1620,7 @@ static int nvme_tcp_alloc_queue(struct nvme_ctrl *nctrl, int qid)
 	if (ret)
 		goto err_init_connect;
 
-	queue->rd_enabled = true;
 	set_bit(NVME_TCP_Q_ALLOCATED, &queue->flags);
-	nvme_tcp_init_recv_ctx(queue);
-
-	write_lock_bh(&queue->sock->sk->sk_callback_lock);
-	queue->sock->sk->sk_user_data = queue;
-	queue->state_change = queue->sock->sk->sk_state_change;
-	queue->data_ready = queue->sock->sk->sk_data_ready;
-	queue->write_space = queue->sock->sk->sk_write_space;
-	queue->sock->sk->sk_data_ready = nvme_tcp_data_ready;
-	queue->sock->sk->sk_state_change = nvme_tcp_state_change;
-	queue->sock->sk->sk_write_space = nvme_tcp_write_space;
-#ifdef CONFIG_NET_RX_BUSY_POLL
-	queue->sock->sk->sk_ll_usec = 1;
-#endif
-	write_unlock_bh(&queue->sock->sk->sk_callback_lock);
 
 	return 0;
 
@@ -1691,18 +1676,34 @@ static void nvme_tcp_stop_queue(struct nvme_ctrl *nctrl, int qid)
 static int nvme_tcp_start_queue(struct nvme_ctrl *nctrl, int idx)
 {
 	struct nvme_tcp_ctrl *ctrl = to_tcp_ctrl(nctrl);
+	struct nvme_tcp_queue *queue = &ctrl->queues[idx];
 	int ret;
 
+	queue->rd_enabled = true;
+	nvme_tcp_init_recv_ctx(queue);
+	write_lock_bh(&queue->sock->sk->sk_callback_lock);
+	queue->sock->sk->sk_user_data = queue;
+	queue->state_change = queue->sock->sk->sk_state_change;
+	queue->data_ready = queue->sock->sk->sk_data_ready;
+	queue->write_space = queue->sock->sk->sk_write_space;
+	queue->sock->sk->sk_data_ready = nvme_tcp_data_ready;
+	queue->sock->sk->sk_state_change = nvme_tcp_state_change;
+	queue->sock->sk->sk_write_space = nvme_tcp_write_space;
+#ifdef CONFIG_NET_RX_BUSY_POLL
+	queue->sock->sk->sk_ll_usec = 1;
+#endif
+	write_unlock_bh(&queue->sock->sk->sk_callback_lock);
+
 	if (idx)
 		ret = nvmf_connect_io_queue(nctrl, idx);
 	else
 		ret = nvmf_connect_admin_queue(nctrl);
 
 	if (!ret) {
-		set_bit(NVME_TCP_Q_LIVE, &ctrl->queues[idx].flags);
+		set_bit(NVME_TCP_Q_LIVE, &queue->flags);
 	} else {
-		if (test_bit(NVME_TCP_Q_ALLOCATED, &ctrl->queues[idx].flags))
-			__nvme_tcp_stop_queue(&ctrl->queues[idx]);
+		if (test_bit(NVME_TCP_Q_ALLOCATED, &queue->flags))
+			__nvme_tcp_stop_queue(queue);
 		dev_err(nctrl->device,
 			"failed to connect queue: %d ret=%d\n", idx, ret);
 	}
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH v2] nvme-tcp: fix a possible UAF when failing to allocate an io queue
  2023-03-19 13:14 [PATCH v2] nvme-tcp: fix a possible UAF when failing to allocate an io queue Sagi Grimberg
@ 2023-03-20 12:51 ` Hannes Reinecke
  2023-03-20 13:30   ` Sagi Grimberg
  0 siblings, 1 reply; 3+ messages in thread
From: Hannes Reinecke @ 2023-03-20 12:51 UTC (permalink / raw)
  To: linux-nvme

On 3/19/23 14:14, Sagi Grimberg wrote:
> When we allocate a nvme-tcp queue, we set the data_ready callback before
> we actually need to use it. This creates the potential that if a stray
> controller sends us data on the socket before we connect, we can trigger
> the io_work and start consuming the socket.
> 
> In this case reported: we failed to allocate one of the io queues, and
> as we start releasing the queues that we already allocated, we get
> a UAF [1] from the io_work which is running before it should really.
> 
> Fix this by setting the socket ops callbacks only before we start the
> queue, so that we can't accidently schedule the io_work in the initialization
> phase before the queue started.
> 
> [1]:
> [16802.107284] nvme nvme4: starting error recovery
> [16802.109166] nvme nvme4: Reconnecting in 10 seconds...
> [16812.173535] nvme nvme4: failed to connect socket: -111
> [16812.173745] nvme nvme4: Failed reconnect attempt 1
> [16812.173747] nvme nvme4: Reconnecting in 10 seconds...
> [16822.413555] nvme nvme4: failed to connect socket: -111
> [16822.413762] nvme nvme4: Failed reconnect attempt 2
> [16822.413765] nvme nvme4: Reconnecting in 10 seconds...
> [16832.661274] nvme nvme4: creating 32 I/O queues.
> [16833.919887] BUG: kernel NULL pointer dereference, address: 0000000000000088
> [16833.920068] nvme nvme4: Failed reconnect attempt 3
> [16833.920094] #PF: supervisor write access in kernel mode
> [16833.920261] nvme nvme4: Reconnecting in 10 seconds...
> [16833.920368] #PF: error_code(0x0002) - not-present page
> [16833.921086] Workqueue: nvme_tcp_wq nvme_tcp_io_work [nvme_tcp]
> [16833.921191] RIP: 0010:_raw_spin_lock_bh+0x17/0x30
> ...
> [16833.923138] Call Trace:
> [16833.923271]  <TASK>
> [16833.923402]  lock_sock_nested+0x1e/0x50
> [16833.923545]  nvme_tcp_try_recv+0x40/0xa0 [nvme_tcp]
> [16833.923685]  nvme_tcp_io_work+0x68/0xa0 [nvme_tcp]
> [16833.923824]  process_one_work+0x1e8/0x390
> [16833.923969]  worker_thread+0x53/0x3d0
> [16833.924104]  ? process_one_work+0x390/0x390
> [16833.924240]  kthread+0x124/0x150
> [16833.924376]  ? set_kthread_struct+0x50/0x50
> [16833.924518]  ret_from_fork+0x1f/0x30
> [16833.924655]  </TASK>
> 
> Reported-by: Yanjun Zhang <zhangyanjun@cestc.cn>
> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
> ---
> Changes from v1:
> - Fix silly compliation error
> 
> Yanjun, I'll be waiting for your Tested-by tag. Need to
> in order to apply this fix.
> 
>   drivers/nvme/host/tcp.c | 37 +++++++++++++++++++------------------
>   1 file changed, 19 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
> index 42c0598c31f2..4ef614bf201c 100644
> --- a/drivers/nvme/host/tcp.c
> +++ b/drivers/nvme/host/tcp.c
> @@ -1620,22 +1620,7 @@ static int nvme_tcp_alloc_queue(struct nvme_ctrl *nctrl, int qid)
>   	if (ret)
>   		goto err_init_connect;
>   
> -	queue->rd_enabled = true;
>   	set_bit(NVME_TCP_Q_ALLOCATED, &queue->flags);
> -	nvme_tcp_init_recv_ctx(queue);
> -
> -	write_lock_bh(&queue->sock->sk->sk_callback_lock);
> -	queue->sock->sk->sk_user_data = queue;
> -	queue->state_change = queue->sock->sk->sk_state_change;
> -	queue->data_ready = queue->sock->sk->sk_data_ready;
> -	queue->write_space = queue->sock->sk->sk_write_space;
> -	queue->sock->sk->sk_data_ready = nvme_tcp_data_ready;
> -	queue->sock->sk->sk_state_change = nvme_tcp_state_change;
> -	queue->sock->sk->sk_write_space = nvme_tcp_write_space;
> -#ifdef CONFIG_NET_RX_BUSY_POLL
> -	queue->sock->sk->sk_ll_usec = 1;
> -#endif
> -	write_unlock_bh(&queue->sock->sk->sk_callback_lock);
>   
>   	return 0;
>   
> @@ -1691,18 +1676,34 @@ static void nvme_tcp_stop_queue(struct nvme_ctrl *nctrl, int qid)
>   static int nvme_tcp_start_queue(struct nvme_ctrl *nctrl, int idx)
>   {
>   	struct nvme_tcp_ctrl *ctrl = to_tcp_ctrl(nctrl);
> +	struct nvme_tcp_queue *queue = &ctrl->queues[idx];
>   	int ret;
>   
> +	queue->rd_enabled = true;
> +	nvme_tcp_init_recv_ctx(queue);
> +	write_lock_bh(&queue->sock->sk->sk_callback_lock);
> +	queue->sock->sk->sk_user_data = queue;
> +	queue->state_change = queue->sock->sk->sk_state_change;
> +	queue->data_ready = queue->sock->sk->sk_data_ready;
> +	queue->write_space = queue->sock->sk->sk_write_space;
> +	queue->sock->sk->sk_data_ready = nvme_tcp_data_ready;
> +	queue->sock->sk->sk_state_change = nvme_tcp_state_change;
> +	queue->sock->sk->sk_write_space = nvme_tcp_write_space;
> +#ifdef CONFIG_NET_RX_BUSY_POLL
> +	queue->sock->sk->sk_ll_usec = 1;
> +#endif
> +	write_unlock_bh(&queue->sock->sk->sk_callback_lock);
> +
Can't you put this into a separate function?
(Will be needing that for TLS support anyway :-)

And shouldn't we consider 'rcu_write_lock_bh' and 
'rcu_assign_sk_user_data()' here?

Cheers,

Hannes



^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH v2] nvme-tcp: fix a possible UAF when failing to allocate an io queue
  2023-03-20 12:51 ` Hannes Reinecke
@ 2023-03-20 13:30   ` Sagi Grimberg
  0 siblings, 0 replies; 3+ messages in thread
From: Sagi Grimberg @ 2023-03-20 13:30 UTC (permalink / raw)
  To: Hannes Reinecke, linux-nvme


>> @@ -1691,18 +1676,34 @@ static void nvme_tcp_stop_queue(struct 
>> nvme_ctrl *nctrl, int qid)
>>   static int nvme_tcp_start_queue(struct nvme_ctrl *nctrl, int idx)
>>   {
>>       struct nvme_tcp_ctrl *ctrl = to_tcp_ctrl(nctrl);
>> +    struct nvme_tcp_queue *queue = &ctrl->queues[idx];
>>       int ret;
>> +    queue->rd_enabled = true;
>> +    nvme_tcp_init_recv_ctx(queue);
>> +    write_lock_bh(&queue->sock->sk->sk_callback_lock);
>> +    queue->sock->sk->sk_user_data = queue;
>> +    queue->state_change = queue->sock->sk->sk_state_change;
>> +    queue->data_ready = queue->sock->sk->sk_data_ready;
>> +    queue->write_space = queue->sock->sk->sk_write_space;
>> +    queue->sock->sk->sk_data_ready = nvme_tcp_data_ready;
>> +    queue->sock->sk->sk_state_change = nvme_tcp_state_change;
>> +    queue->sock->sk->sk_write_space = nvme_tcp_write_space;
>> +#ifdef CONFIG_NET_RX_BUSY_POLL
>> +    queue->sock->sk->sk_ll_usec = 1;
>> +#endif
>> +    write_unlock_bh(&queue->sock->sk->sk_callback_lock);
>> +
> Can't you put this into a separate function?
> (Will be needing that for TLS support anyway :-)

Sure.

> And shouldn't we consider 'rcu_write_lock_bh'

I don't know what that is...

> and 'rcu_assign_sk_user_data()' here?

I would like to understand what would be a possible
race condition if it is already accessed under rw_lock_bh.


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2023-03-20 13:30 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-03-19 13:14 [PATCH v2] nvme-tcp: fix a possible UAF when failing to allocate an io queue Sagi Grimberg
2023-03-20 12:51 ` Hannes Reinecke
2023-03-20 13:30   ` Sagi Grimberg

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).