* [PATCH] nvmet-tcp: fix potential race of tcp socket closing accept_work
@ 2021-02-05 19:47 Sagi Grimberg
2021-02-07 16:16 ` Christoph Hellwig
2021-02-12 7:55 ` Grupi, Elad
0 siblings, 2 replies; 5+ messages in thread
From: Sagi Grimberg @ 2021-02-05 19:47 UTC (permalink / raw)
To: linux-nvme, Christoph Hellwig, Keith Busch, Chaitanya Kulkarni; +Cc: Elad Grupi
When we accept a TCP connection and allocate an nvmet-tcp queue we should
make sure not to fully establish it or reference it as the connection may
be already closing, which triggers queue release work, which does not
fence against queue establishment.
In order to address such a race, we make sure to check the sk_state and
contain the queue reference to be done underneath the sk_callback_lock
such that the queue release work correctly fences against it.
Fixes: 872d26a391da ("nvmet-tcp: add NVMe over TCP target driver")
Reported-by: Elad Grupi <elad.grupi@dell.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
---
drivers/nvme/target/tcp.c | 28 ++++++++++++++++++----------
1 file changed, 18 insertions(+), 10 deletions(-)
diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
index 577ce7d403ae..8b0485ada315 100644
--- a/drivers/nvme/target/tcp.c
+++ b/drivers/nvme/target/tcp.c
@@ -1485,17 +1485,27 @@ static int nvmet_tcp_set_queue_sock(struct nvmet_tcp_queue *queue)
if (inet->rcv_tos > 0)
ip_sock_set_tos(sock->sk, inet->rcv_tos);
+ ret = 0;
write_lock_bh(&sock->sk->sk_callback_lock);
- sock->sk->sk_user_data = queue;
- queue->data_ready = sock->sk->sk_data_ready;
- sock->sk->sk_data_ready = nvmet_tcp_data_ready;
- queue->state_change = sock->sk->sk_state_change;
- sock->sk->sk_state_change = nvmet_tcp_state_change;
- queue->write_space = sock->sk->sk_write_space;
- sock->sk->sk_write_space = nvmet_tcp_write_space;
+ if (sock->sk->sk_state != TCP_ESTABLISHED) {
+ /*
+ * If the socket is already closing, don't even start
+ * consuming it
+ */
+ ret = -ENOTCONN;
+ } else {
+ sock->sk->sk_user_data = queue;
+ queue->data_ready = sock->sk->sk_data_ready;
+ sock->sk->sk_data_ready = nvmet_tcp_data_ready;
+ queue->state_change = sock->sk->sk_state_change;
+ sock->sk->sk_state_change = nvmet_tcp_state_change;
+ queue->write_space = sock->sk->sk_write_space;
+ sock->sk->sk_write_space = nvmet_tcp_write_space;
+ queue_work_on(queue_cpu(queue), nvmet_tcp_wq, &queue->io_work);
+ }
write_unlock_bh(&sock->sk->sk_callback_lock);
- return 0;
+ return ret;
}
static int nvmet_tcp_alloc_queue(struct nvmet_tcp_port *port,
@@ -1543,8 +1553,6 @@ static int nvmet_tcp_alloc_queue(struct nvmet_tcp_port *port,
if (ret)
goto out_destroy_sq;
- queue_work_on(queue_cpu(queue), nvmet_tcp_wq, &queue->io_work);
-
return 0;
out_destroy_sq:
mutex_lock(&nvmet_tcp_queue_mutex);
--
2.27.0
_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH] nvmet-tcp: fix potential race of tcp socket closing accept_work
2021-02-05 19:47 [PATCH] nvmet-tcp: fix potential race of tcp socket closing accept_work Sagi Grimberg
@ 2021-02-07 16:16 ` Christoph Hellwig
2021-02-12 7:55 ` Grupi, Elad
1 sibling, 0 replies; 5+ messages in thread
From: Christoph Hellwig @ 2021-02-07 16:16 UTC (permalink / raw)
To: Sagi Grimberg
Cc: Keith Busch, Chaitanya Kulkarni, Elad Grupi, Christoph Hellwig,
linux-nvme
Thanks,
applied to nvme-5.12.
_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
^ permalink raw reply [flat|nested] 5+ messages in thread
* RE: [PATCH] nvmet-tcp: fix potential race of tcp socket closing accept_work
2021-02-05 19:47 [PATCH] nvmet-tcp: fix potential race of tcp socket closing accept_work Sagi Grimberg
2021-02-07 16:16 ` Christoph Hellwig
@ 2021-02-12 7:55 ` Grupi, Elad
2021-02-12 22:22 ` Sagi Grimberg
1 sibling, 1 reply; 5+ messages in thread
From: Grupi, Elad @ 2021-02-12 7:55 UTC (permalink / raw)
To: Sagi Grimberg, linux-nvme@lists.infradead.org, Christoph Hellwig,
Keith Busch, Chaitanya Kulkarni
Sagi,
I think there is another race in that solution.
Please consider following flow:
Thread 1 taking sk_callback_lock and goes into the else block (state is TCP_ESTABLISHED), going to line:
sock->sk->sk_user_data = queue;
at that point, thread 2 invokes sk_state_change, but this is still the original state_change callback because we didn't set yet to nvmet_tcp_state_change.
Thread 1 continues and set the callback to nvmet_tcp_state_change, but the sk_state_change was already called and nvmet_tcp_state_change will be never called.
In that case, there is a leak, and the queue will never be removed.
Elad
-----Original Message-----
From: Sagi Grimberg <sagi@grimberg.me>
Sent: Friday, 5 February 2021 21:47
To: linux-nvme@lists.infradead.org; Christoph Hellwig; Keith Busch; Chaitanya Kulkarni
Cc: Grupi, Elad
Subject: [PATCH] nvmet-tcp: fix potential race of tcp socket closing accept_work
[EXTERNAL EMAIL]
When we accept a TCP connection and allocate an nvmet-tcp queue we should make sure not to fully establish it or reference it as the connection may be already closing, which triggers queue release work, which does not fence against queue establishment.
In order to address such a race, we make sure to check the sk_state and contain the queue reference to be done underneath the sk_callback_lock such that the queue release work correctly fences against it.
Fixes: 872d26a391da ("nvmet-tcp: add NVMe over TCP target driver")
Reported-by: Elad Grupi <elad.grupi@dell.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
---
drivers/nvme/target/tcp.c | 28 ++++++++++++++++++----------
1 file changed, 18 insertions(+), 10 deletions(-)
diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c index 577ce7d403ae..8b0485ada315 100644
--- a/drivers/nvme/target/tcp.c
+++ b/drivers/nvme/target/tcp.c
@@ -1485,17 +1485,27 @@ static int nvmet_tcp_set_queue_sock(struct nvmet_tcp_queue *queue)
if (inet->rcv_tos > 0)
ip_sock_set_tos(sock->sk, inet->rcv_tos);
+ ret = 0;
write_lock_bh(&sock->sk->sk_callback_lock);
- sock->sk->sk_user_data = queue;
- queue->data_ready = sock->sk->sk_data_ready;
- sock->sk->sk_data_ready = nvmet_tcp_data_ready;
- queue->state_change = sock->sk->sk_state_change;
- sock->sk->sk_state_change = nvmet_tcp_state_change;
- queue->write_space = sock->sk->sk_write_space;
- sock->sk->sk_write_space = nvmet_tcp_write_space;
+ if (sock->sk->sk_state != TCP_ESTABLISHED) {
+ /*
+ * If the socket is already closing, don't even start
+ * consuming it
+ */
+ ret = -ENOTCONN;
+ } else {
+ sock->sk->sk_user_data = queue;
+ queue->data_ready = sock->sk->sk_data_ready;
+ sock->sk->sk_data_ready = nvmet_tcp_data_ready;
+ queue->state_change = sock->sk->sk_state_change;
+ sock->sk->sk_state_change = nvmet_tcp_state_change;
+ queue->write_space = sock->sk->sk_write_space;
+ sock->sk->sk_write_space = nvmet_tcp_write_space;
+ queue_work_on(queue_cpu(queue), nvmet_tcp_wq, &queue->io_work);
+ }
write_unlock_bh(&sock->sk->sk_callback_lock);
- return 0;
+ return ret;
}
static int nvmet_tcp_alloc_queue(struct nvmet_tcp_port *port, @@ -1543,8 +1553,6 @@ static int nvmet_tcp_alloc_queue(struct nvmet_tcp_port *port,
if (ret)
goto out_destroy_sq;
- queue_work_on(queue_cpu(queue), nvmet_tcp_wq, &queue->io_work);
-
return 0;
out_destroy_sq:
mutex_lock(&nvmet_tcp_queue_mutex);
--
2.27.0
_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] nvmet-tcp: fix potential race of tcp socket closing accept_work
2021-02-12 7:55 ` Grupi, Elad
@ 2021-02-12 22:22 ` Sagi Grimberg
2021-02-15 13:00 ` Grupi, Elad
0 siblings, 1 reply; 5+ messages in thread
From: Sagi Grimberg @ 2021-02-12 22:22 UTC (permalink / raw)
To: Grupi, Elad, linux-nvme@lists.infradead.org, Christoph Hellwig,
Keith Busch, Chaitanya Kulkarni
> Sagi,
>
> I think there is another race in that solution.
>
> Please consider following flow:
>
> Thread 1 taking sk_callback_lock and goes into the else block (state is TCP_ESTABLISHED), going to line:
> sock->sk->sk_user_data = queue;
>
> at that point, thread 2 invokes sk_state_change, but this is still the original state_change callback because we didn't set yet to nvmet_tcp_state_change.
>
> Thread 1 continues and set the callback to nvmet_tcp_state_change, but the sk_state_change was already called and nvmet_tcp_state_change will be never called.
>
> In that case, there is a leak, and the queue will never be removed.
Are you saying that this is needed?
--
diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
index ac2d9ed23cea..d82df6cca801 100644
--- a/drivers/nvme/target/tcp.c
+++ b/drivers/nvme/target/tcp.c
@@ -1486,6 +1486,7 @@ static int nvmet_tcp_set_queue_sock(struct
nvmet_tcp_queue *queue)
ret = 0;
write_lock_bh(&sock->sk->sk_callback_lock);
+ lock_sock(sock->sk);
if (sock->sk->sk_state != TCP_ESTABLISHED) {
/*
* If the socket is already closing, don't even start
@@ -1502,6 +1503,7 @@ static int nvmet_tcp_set_queue_sock(struct
nvmet_tcp_queue *queue)
sock->sk->sk_write_space = nvmet_tcp_write_space;
queue_work_on(queue_cpu(queue), nvmet_tcp_wq,
&queue->io_work);
}
+ release_sock(sock->sk);
write_unlock_bh(&sock->sk->sk_callback_lock);
return ret;
--
_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
^ permalink raw reply related [flat|nested] 5+ messages in thread
* RE: [PATCH] nvmet-tcp: fix potential race of tcp socket closing accept_work
2021-02-12 22:22 ` Sagi Grimberg
@ 2021-02-15 13:00 ` Grupi, Elad
0 siblings, 0 replies; 5+ messages in thread
From: Grupi, Elad @ 2021-02-15 13:00 UTC (permalink / raw)
To: Sagi Grimberg, linux-nvme@lists.infradead.org, Christoph Hellwig,
Keith Busch, Chaitanya Kulkarni
Yes. That's about what I meant.
Does lock_sock protects against invoking socket callbacks?
-----Original Message-----
From: Sagi Grimberg <sagi@grimberg.me>
Sent: Saturday, 13 February 2021 0:22
To: Grupi, Elad; linux-nvme@lists.infradead.org; Christoph Hellwig; Keith Busch; Chaitanya Kulkarni
Subject: Re: [PATCH] nvmet-tcp: fix potential race of tcp socket closing accept_work
[EXTERNAL EMAIL]
> Sagi,
>
> I think there is another race in that solution.
>
> Please consider following flow:
>
> Thread 1 taking sk_callback_lock and goes into the else block (state is TCP_ESTABLISHED), going to line:
> sock->sk->sk_user_data = queue;
>
> at that point, thread 2 invokes sk_state_change, but this is still the original state_change callback because we didn't set yet to nvmet_tcp_state_change.
>
> Thread 1 continues and set the callback to nvmet_tcp_state_change, but the sk_state_change was already called and nvmet_tcp_state_change will be never called.
>
> In that case, there is a leak, and the queue will never be removed.
Are you saying that this is needed?
--
diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c index ac2d9ed23cea..d82df6cca801 100644
--- a/drivers/nvme/target/tcp.c
+++ b/drivers/nvme/target/tcp.c
@@ -1486,6 +1486,7 @@ static int nvmet_tcp_set_queue_sock(struct nvmet_tcp_queue *queue)
ret = 0;
write_lock_bh(&sock->sk->sk_callback_lock);
+ lock_sock(sock->sk);
if (sock->sk->sk_state != TCP_ESTABLISHED) {
/*
* If the socket is already closing, don't even start @@ -1502,6 +1503,7 @@ static int nvmet_tcp_set_queue_sock(struct nvmet_tcp_queue *queue)
sock->sk->sk_write_space = nvmet_tcp_write_space;
queue_work_on(queue_cpu(queue), nvmet_tcp_wq, &queue->io_work);
}
+ release_sock(sock->sk);
write_unlock_bh(&sock->sk->sk_callback_lock);
return ret;
--
_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2021-02-15 13:01 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-02-05 19:47 [PATCH] nvmet-tcp: fix potential race of tcp socket closing accept_work Sagi Grimberg
2021-02-07 16:16 ` Christoph Hellwig
2021-02-12 7:55 ` Grupi, Elad
2021-02-12 22:22 ` Sagi Grimberg
2021-02-15 13:00 ` Grupi, Elad
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox