From: Taehee Yoo <ap420073@gmail.com>
To: Sagi Grimberg <sagi@grimberg.me>,
linux-nvme@lists.infradead.org, kbusch@kernel.org, axboe@fb.com,
hch@lst.de, kch@nvidia.com
Cc: larrystevenwise@gmail.com, anthony.j.knapp@intel.com,
pizhenwei@bytedance.com
Subject: Re: [PATCH 4/4] nvmet-tcp: fix memory leak in nvmet_tcp_free_cmd_data_in_buffers()
Date: Wed, 4 Jan 2023 17:44:13 +0900 [thread overview]
Message-ID: <f9754b2f-a260-9d38-355d-8923399c1e7c@gmail.com> (raw)
In-Reply-To: <f49ee5c1-cea7-86fc-b92a-4423a94389be@grimberg.me>
Hi Sagi,
Thank you so much for your review!
On 1/3/23 19:54, Sagi Grimberg wrote:
>
>> While tcp socket is being released, nvmet_tcp_release_queue_work() is
>> called.
>> It calls nvmet_tcp_free_cmd_data_in_buffers() to free CMD resources.
>> But it may skip freeing resources due to unnecessary condition checks.
>> So, the memory leak will occur.
>>
>> In order to fix this problem, it removes unnecessary condition checks
>> in nvmet_tcp_free_cmd_data_in_buffers().
>>
>> This memory leak issue will occur in the target machine when a host
>> sends reset command to a target.
>>
>> Reproducer:
>> while :
>> do
>> echo 1 > /sys/class/nvme/nvme<NS>/reset_controller
>> done
>>
>> unreferenced object 0xffff88814a5c6da0 (size 32):
>> comm "kworker/2:1H", pid 176, jiffies 4305953739 (age 72707.743s)
>> hex dump (first 32 bytes):
>> 82 84 c8 04 00 ea ff ff 00 00 00 00 00 04 00 00 ................
>> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
>> backtrace:
>> [<ffffffffaa795cf7>] __kmalloc+0x47/0xc0
>> [<ffffffffab030a72>] sgl_alloc_order+0x82/0x3a0
>> [<ffffffffc086593c>] nvmet_tcp_map_data+0x1bc/0x570 [nvmet_tcp]
>> [<ffffffffc086a7d4>] nvmet_tcp_try_recv_pdu+0x7f4/0x1e20
[nvmet_tcp]
>> [<ffffffffc086c5a0>] nvmet_tcp_io_work+0x120/0x3272 [nvmet_tcp]
>> [<ffffffffaa1b059d>] process_one_work+0x81d/0x1450
>> [<ffffffffaa1b177c>] worker_thread+0x5ac/0xed0
>> [<ffffffffaa1c8ccf>] kthread+0x29f/0x340
>> [<ffffffffaa0034cf>] ret_from_fork+0x1f/0x30
>> unreferenced object 0xffff888153f3e1c0 (size 16):
>> comm "kworker/2:1H", pid 176, jiffies 4305953739 (age 72707.743s)
>> hex dump (first 16 bytes):
>> 80 84 c8 04 00 ea ff ff 00 04 00 00 00 00 00 00 ................
>> backtrace:
>> [<ffffffffaa795cf7>] __kmalloc+0x47/0xc0
>> [<ffffffffc0865a80>] nvmet_tcp_map_data+0x300/0x570 [nvmet_tcp]
>> [<ffffffffc086a7d4>] nvmet_tcp_try_recv_pdu+0x7f4/0x1e20
[nvmet_tcp]
>> [<ffffffffc086c5a0>] nvmet_tcp_io_work+0x120/0x3272 [nvmet_tcp]
>> [<ffffffffaa1b059d>] process_one_work+0x81d/0x1450
>> [<ffffffffaa1b177c>] worker_thread+0x5ac/0xed0
>> [<ffffffffaa1c8ccf>] kthread+0x29f/0x340
>> [<ffffffffaa0034cf>] ret_from_fork+0x1f/0x30
>>
>> Fixes: db94f240280c ("nvmet-tcp: fix NULL pointer dereference during
>> release")
>> Signed-off-by: Taehee Yoo <ap420073@gmail.com>
>> ---
>> drivers/nvme/target/tcp.c | 9 +++------
>> 1 file changed, 3 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
>> index cc05c094de22..dac08603fec9 100644
>> --- a/drivers/nvme/target/tcp.c
>> +++ b/drivers/nvme/target/tcp.c
>> @@ -1427,13 +1427,10 @@ static void
>> nvmet_tcp_free_cmd_data_in_buffers(struct nvmet_tcp_queue *queue)
>> struct nvmet_tcp_cmd *cmd = queue->cmds;
>> int i;
>> - for (i = 0; i < queue->nr_cmds; i++, cmd++) {
>> - if (nvmet_tcp_need_data_in(cmd))
>> - nvmet_tcp_free_cmd_buffers(cmd);
>> - }
>> + for (i = 0; i < queue->nr_cmds; i++, cmd++)
>> + nvmet_tcp_free_cmd_buffers(cmd);
>> - if (!queue->nr_cmds && nvmet_tcp_need_data_in(&queue->connect))
>> - nvmet_tcp_free_cmd_buffers(&queue->connect);
>> + nvmet_tcp_free_cmd_buffers(&queue->connect);
>> }
>> static void nvmet_tcp_release_queue_work(struct work_struct *w)
>
> Would it be possible to understand what are the commands that are
> leaking here? commands that do not wait for data from the host, should
> have a normal path of release, if that is not happening, we may be in
> risk for use-after-free or still have a leak somewhere.
I tested with a reproducer and a debug patch like below.
#SHELL 1
while :
do
echo 1 > /sys/class/nvme/nvme0/reset_controller
done
#SHELL 2
mount /dev/nvme0n1p66 ~/mnt
while :
do
echo "aaaaaaaaaaaaa" > ~/mnt/test.txt
done
diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
index cc05c094de22..0cdf83b636a2 100644
--- a/drivers/nvme/target/tcp.c
+++ b/drivers/nvme/target/tcp.c
@@ -1428,8 +1428,13 @@ static void
nvmet_tcp_free_cmd_data_in_buffers(struct nvmet_tcp_queue *queue)
int i;
for (i = 0; i < queue->nr_cmds; i++, cmd++) {
- if (nvmet_tcp_need_data_in(cmd))
+ if (nvmet_tcp_need_data_in(cmd)) {
nvmet_tcp_free_cmd_buffers(cmd);
+ } else {
+ if (cmd->req.sg) {
+ printk("[TEST]%s %u cmd = %x exec =
%pS\n", __func__, __LINE__, cmd->cmd_pdu->cmd.common.opcode,
cmd->req.execute);
+ }
+ }
}
Results:
[ 393.244102] [TEST]nvmet_tcp_free_cmd_data_in_buffers 1435 cmd = 1
exec = nvmet_bdev_execute_rw+0x0/0xc20
[ 393.607385] [TEST]nvmet_tcp_free_cmd_data_in_buffers 1435 cmd = 1
exec = nvmet_bdev_execute_rw+0x0/0xc20
[ 393.717213] [TEST]nvmet_tcp_free_cmd_data_in_buffers 1435 cmd = 1
exec = nvmet_bdev_execute_rw+0x0/0xc20
[ 393.825434] [TEST]nvmet_tcp_free_cmd_data_in_buffers 1435 cmd = 1
exec = nvmet_bdev_execute_rw+0x0/0xc20
[ 393.952042] [TEST]nvmet_tcp_free_cmd_data_in_buffers 1435 cmd = 1
exec = nvmet_bdev_execute_rw+0x0/0xc20
I think these are not only the leaking case.
Possibly there are more cases.
But I can't test all the cases.
I'm not sure that this info is what really you wanted.
If this is not satisfactory info, please let me know.
Thanks,
Taehee Yoo
prev parent reply other threads:[~2023-01-04 8:45 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-01-03 10:03 [PATCH 0/4] nvme: fix several bugs in nvme-fabric Taehee Yoo
2023-01-03 10:03 ` [PATCH 1/4] nvme: fix delete uninitialized controller Taehee Yoo
2023-01-03 10:30 ` Sagi Grimberg
2023-01-04 0:24 ` Chaitanya Kulkarni
2023-01-04 2:42 ` Taehee Yoo
2023-01-03 10:03 ` [PATCH 2/4] nvme: fix reset " Taehee Yoo
2023-01-03 10:32 ` Sagi Grimberg
2023-01-03 10:03 ` [PATCH 3/4] nvmet: fix hang in nvmet_ns_disable() Taehee Yoo
2023-01-03 10:58 ` Sagi Grimberg
2023-01-04 0:32 ` Chaitanya Kulkarni
2023-01-04 8:56 ` Taehee Yoo
2023-01-03 10:03 ` [PATCH 4/4] nvmet-tcp: fix memory leak in nvmet_tcp_free_cmd_data_in_buffers() Taehee Yoo
2023-01-03 10:54 ` Sagi Grimberg
2023-01-04 8:44 ` Taehee Yoo [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=f9754b2f-a260-9d38-355d-8923399c1e7c@gmail.com \
--to=ap420073@gmail.com \
--cc=anthony.j.knapp@intel.com \
--cc=axboe@fb.com \
--cc=hch@lst.de \
--cc=kbusch@kernel.org \
--cc=kch@nvidia.com \
--cc=larrystevenwise@gmail.com \
--cc=linux-nvme@lists.infradead.org \
--cc=pizhenwei@bytedance.com \
--cc=sagi@grimberg.me \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox