* blktests nvme 041,042 leak memory @ 2024-05-26 19:47 Sagi Grimberg 2024-05-27 1:11 ` Yi Zhang 0 siblings, 1 reply; 19+ messages in thread From: Sagi Grimberg @ 2024-05-26 19:47 UTC (permalink / raw) To: open list:NVM EXPRESS DRIVER, Shin'ichiro Kawasaki, Hannes Reinecke Just noticed that these tests generate a kmemleak complaint. -- [ 1889.516324] kmemleak: 2 new suspected memory leaks (see /sys/kernel/debug/kmemleak) [13221.498042] loop6: detected capacity change from 0 to 8 ^C root@nvme:~# cat /sys/kernel/debug/kmemleak unreferenced object 0xffff91488fb07200 (size 256): comm "nvme", pid 29344, jiffies 4295069699 hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 08 72 b0 8f 48 91 ff ff .........r..H... 08 72 b0 8f 48 91 ff ff 00 d6 55 a2 ff ff ff ff .r..H.....U..... backtrace (crc 36f6c278): [<ffffffffa2b8b46a>] kmemleak_alloc+0x4a/0x90 [<ffffffffa1e57897>] kmalloc_trace+0x2f7/0x3a0 [<ffffffffa2562fd0>] device_add+0x510/0x8c0 [<ffffffffa1ee5b9e>] cdev_device_add+0x4e/0xc0 [<ffffffffc133b7a0>] nvme_init_ctrl+0x360/0x460 [nvme_core] [<ffffffffc13b0af6>] 0xffffffffc13b0af6 [<ffffffffc120e565>] 0xffffffffc120e565 [<ffffffffa1edeb14>] vfs_write+0x104/0x490 [<ffffffffa1edf263>] ksys_write+0x73/0x100 [<ffffffffa1edf319>] __x64_sys_write+0x19/0x30 [<ffffffffa1a0553e>] x64_sys_call+0x7e/0x25c0 [<ffffffffa2b7be01>] do_syscall_64+0x71/0x130 [<ffffffffa2c0012b>] entry_SYSCALL_64_after_hwframe+0x76/0x7e unreferenced object 0xffff91488ed7ea00 (size 256): comm "nvme", pid 37478, jiffies 4295155857 hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 08 ea d7 8e 48 91 ff ff ............H... 08 ea d7 8e 48 91 ff ff 00 d6 55 a2 ff ff ff ff ....H.....U..... backtrace (crc fc1acf5): [<ffffffffa2b8b46a>] kmemleak_alloc+0x4a/0x90 [<ffffffffa1e57897>] kmalloc_trace+0x2f7/0x3a0 [<ffffffffa2562fd0>] device_add+0x510/0x8c0 [<ffffffffa1ee5b9e>] cdev_device_add+0x4e/0xc0 [<ffffffffc133b7a0>] nvme_init_ctrl+0x360/0x460 [nvme_core] [<ffffffffc13707b7>] 0xffffffffc13707b7 [<ffffffffc120e565>] 0xffffffffc120e565 [<ffffffffa1edeb14>] vfs_write+0x104/0x490 [<ffffffffa1edf263>] ksys_write+0x73/0x100 [<ffffffffa1edf319>] __x64_sys_write+0x19/0x30 [<ffffffffa1a0553e>] x64_sys_call+0x7e/0x25c0 [<ffffffffa2b7be01>] do_syscall_64+0x71/0x130 [<ffffffffa2c0012b>] entry_SYSCALL_64_after_hwframe+0x76/0x7e -- The culprit is when failing after device_initialize/cdev_device_add... Thought I'd share it as I'm not going to address it today. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: blktests nvme 041,042 leak memory 2024-05-26 19:47 blktests nvme 041,042 leak memory Sagi Grimberg @ 2024-05-27 1:11 ` Yi Zhang 2024-05-28 9:44 ` Maurizio Lombardi 0 siblings, 1 reply; 19+ messages in thread From: Yi Zhang @ 2024-05-27 1:11 UTC (permalink / raw) To: Sagi Grimberg Cc: open list:NVM EXPRESS DRIVER, Shin'ichiro Kawasaki, Hannes Reinecke Yeah, I also met this leak during nvme/044 https://lore.kernel.org/linux-nvme/CAHj4cs-jM_46qK2n3g9vFguq7LVBSm1h38b6Sg_j5veubontWw@mail.gmail.com/ On Mon, May 27, 2024 at 3:49 AM Sagi Grimberg <sagi@grimberg.me> wrote: > > Just noticed that these tests generate a kmemleak complaint. > > -- > [ 1889.516324] kmemleak: 2 new suspected memory leaks (see > /sys/kernel/debug/kmemleak) > [13221.498042] loop6: detected capacity change from 0 to 8 > ^C > root@nvme:~# cat /sys/kernel/debug/kmemleak > unreferenced object 0xffff91488fb07200 (size 256): > comm "nvme", pid 29344, jiffies 4295069699 > hex dump (first 32 bytes): > 00 00 00 00 00 00 00 00 08 72 b0 8f 48 91 ff ff .........r..H... > 08 72 b0 8f 48 91 ff ff 00 d6 55 a2 ff ff ff ff .r..H.....U..... > backtrace (crc 36f6c278): > [<ffffffffa2b8b46a>] kmemleak_alloc+0x4a/0x90 > [<ffffffffa1e57897>] kmalloc_trace+0x2f7/0x3a0 > [<ffffffffa2562fd0>] device_add+0x510/0x8c0 > [<ffffffffa1ee5b9e>] cdev_device_add+0x4e/0xc0 > [<ffffffffc133b7a0>] nvme_init_ctrl+0x360/0x460 [nvme_core] > [<ffffffffc13b0af6>] 0xffffffffc13b0af6 > [<ffffffffc120e565>] 0xffffffffc120e565 > [<ffffffffa1edeb14>] vfs_write+0x104/0x490 > [<ffffffffa1edf263>] ksys_write+0x73/0x100 > [<ffffffffa1edf319>] __x64_sys_write+0x19/0x30 > [<ffffffffa1a0553e>] x64_sys_call+0x7e/0x25c0 > [<ffffffffa2b7be01>] do_syscall_64+0x71/0x130 > [<ffffffffa2c0012b>] entry_SYSCALL_64_after_hwframe+0x76/0x7e > unreferenced object 0xffff91488ed7ea00 (size 256): > comm "nvme", pid 37478, jiffies 4295155857 > hex dump (first 32 bytes): > 00 00 00 00 00 00 00 00 08 ea d7 8e 48 91 ff ff ............H... > 08 ea d7 8e 48 91 ff ff 00 d6 55 a2 ff ff ff ff ....H.....U..... > backtrace (crc fc1acf5): > [<ffffffffa2b8b46a>] kmemleak_alloc+0x4a/0x90 > [<ffffffffa1e57897>] kmalloc_trace+0x2f7/0x3a0 > [<ffffffffa2562fd0>] device_add+0x510/0x8c0 > [<ffffffffa1ee5b9e>] cdev_device_add+0x4e/0xc0 > [<ffffffffc133b7a0>] nvme_init_ctrl+0x360/0x460 [nvme_core] > [<ffffffffc13707b7>] 0xffffffffc13707b7 > [<ffffffffc120e565>] 0xffffffffc120e565 > [<ffffffffa1edeb14>] vfs_write+0x104/0x490 > [<ffffffffa1edf263>] ksys_write+0x73/0x100 > [<ffffffffa1edf319>] __x64_sys_write+0x19/0x30 > [<ffffffffa1a0553e>] x64_sys_call+0x7e/0x25c0 > [<ffffffffa2b7be01>] do_syscall_64+0x71/0x130 > [<ffffffffa2c0012b>] entry_SYSCALL_64_after_hwframe+0x76/0x7e > -- > > The culprit is when failing after device_initialize/cdev_device_add... > Thought I'd share it as I'm not going to address it today. > -- Best Regards, Yi Zhang ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: blktests nvme 041,042 leak memory 2024-05-27 1:11 ` Yi Zhang @ 2024-05-28 9:44 ` Maurizio Lombardi 2024-05-29 9:08 ` Maurizio Lombardi 0 siblings, 1 reply; 19+ messages in thread From: Maurizio Lombardi @ 2024-05-28 9:44 UTC (permalink / raw) To: Yi Zhang Cc: Sagi Grimberg, open list:NVM EXPRESS DRIVER, Shin'ichiro Kawasaki, Hannes Reinecke This happens because if the nvme_init_ctrl() function fails, the callers assume that put_device() isn't necessary and just execute kfree() against the ctrl structure. This patch fixes the problem for TCP, it should also work for loop but I've not tested it. http://bsdbackstore.eu/misc/0001-nvme-fix-memory-leak-when-nvme_init_ctrl-fails.patch -- 2.43.0 po 27. 5. 2024 v 3:12 odesílatel Yi Zhang <yi.zhang@redhat.com> napsal: > > Yeah, I also met this leak during nvme/044 > > https://lore.kernel.org/linux-nvme/CAHj4cs-jM_46qK2n3g9vFguq7LVBSm1h38b6Sg_j5veubontWw@mail.gmail.com/ > > On Mon, May 27, 2024 at 3:49 AM Sagi Grimberg <sagi@grimberg.me> wrote: > > > > Just noticed that these tests generate a kmemleak complaint. > > > > -- > > [ 1889.516324] kmemleak: 2 new suspected memory leaks (see > > /sys/kernel/debug/kmemleak) > > [13221.498042] loop6: detected capacity change from 0 to 8 > > ^C > > root@nvme:~# cat /sys/kernel/debug/kmemleak > > unreferenced object 0xffff91488fb07200 (size 256): > > comm "nvme", pid 29344, jiffies 4295069699 > > hex dump (first 32 bytes): > > 00 00 00 00 00 00 00 00 08 72 b0 8f 48 91 ff ff .........r..H... > > 08 72 b0 8f 48 91 ff ff 00 d6 55 a2 ff ff ff ff .r..H.....U..... > > backtrace (crc 36f6c278): > > [<ffffffffa2b8b46a>] kmemleak_alloc+0x4a/0x90 > > [<ffffffffa1e57897>] kmalloc_trace+0x2f7/0x3a0 > > [<ffffffffa2562fd0>] device_add+0x510/0x8c0 > > [<ffffffffa1ee5b9e>] cdev_device_add+0x4e/0xc0 > > [<ffffffffc133b7a0>] nvme_init_ctrl+0x360/0x460 [nvme_core] > > [<ffffffffc13b0af6>] 0xffffffffc13b0af6 > > [<ffffffffc120e565>] 0xffffffffc120e565 > > [<ffffffffa1edeb14>] vfs_write+0x104/0x490 > > [<ffffffffa1edf263>] ksys_write+0x73/0x100 > > [<ffffffffa1edf319>] __x64_sys_write+0x19/0x30 > > [<ffffffffa1a0553e>] x64_sys_call+0x7e/0x25c0 > > [<ffffffffa2b7be01>] do_syscall_64+0x71/0x130 > > [<ffffffffa2c0012b>] entry_SYSCALL_64_after_hwframe+0x76/0x7e > > unreferenced object 0xffff91488ed7ea00 (size 256): > > comm "nvme", pid 37478, jiffies 4295155857 > > hex dump (first 32 bytes): > > 00 00 00 00 00 00 00 00 08 ea d7 8e 48 91 ff ff ............H... > > 08 ea d7 8e 48 91 ff ff 00 d6 55 a2 ff ff ff ff ....H.....U..... > > backtrace (crc fc1acf5): > > [<ffffffffa2b8b46a>] kmemleak_alloc+0x4a/0x90 > > [<ffffffffa1e57897>] kmalloc_trace+0x2f7/0x3a0 > > [<ffffffffa2562fd0>] device_add+0x510/0x8c0 > > [<ffffffffa1ee5b9e>] cdev_device_add+0x4e/0xc0 > > [<ffffffffc133b7a0>] nvme_init_ctrl+0x360/0x460 [nvme_core] > > [<ffffffffc13707b7>] 0xffffffffc13707b7 > > [<ffffffffc120e565>] 0xffffffffc120e565 > > [<ffffffffa1edeb14>] vfs_write+0x104/0x490 > > [<ffffffffa1edf263>] ksys_write+0x73/0x100 > > [<ffffffffa1edf319>] __x64_sys_write+0x19/0x30 > > [<ffffffffa1a0553e>] x64_sys_call+0x7e/0x25c0 > > [<ffffffffa2b7be01>] do_syscall_64+0x71/0x130 > > [<ffffffffa2c0012b>] entry_SYSCALL_64_after_hwframe+0x76/0x7e > > -- > > > > The culprit is when failing after device_initialize/cdev_device_add... > > Thought I'd share it as I'm not going to address it today. > > > > > -- > Best Regards, > Yi Zhang > > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: blktests nvme 041,042 leak memory 2024-05-28 9:44 ` Maurizio Lombardi @ 2024-05-29 9:08 ` Maurizio Lombardi 2024-05-29 12:51 ` Sagi Grimberg 0 siblings, 1 reply; 19+ messages in thread From: Maurizio Lombardi @ 2024-05-29 9:08 UTC (permalink / raw) To: Yi Zhang Cc: Sagi Grimberg, open list:NVM EXPRESS DRIVER, Shin'ichiro Kawasaki, Hannes Reinecke út 28. 5. 2024 v 11:44 odesílatel Maurizio Lombardi <mlombard@redhat.com> napsal: > > This patch fixes the problem for TCP, it should also work for loop but > I've not tested it. > > http://bsdbackstore.eu/misc/0001-nvme-fix-memory-leak-when-nvme_init_ctrl-fails.patch I updated it to fix all the fabrics, with the exception of apple's driver because I am not sure I fully understand its probe process. Why if blk_mq_alloc_queue() fails it just does put_device() and doesn't uninit the controller? Is it a mistake? Maurizio ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: blktests nvme 041,042 leak memory 2024-05-29 9:08 ` Maurizio Lombardi @ 2024-05-29 12:51 ` Sagi Grimberg 2024-05-29 13:45 ` Maurizio Lombardi 2024-05-29 16:25 ` Keith Busch 0 siblings, 2 replies; 19+ messages in thread From: Sagi Grimberg @ 2024-05-29 12:51 UTC (permalink / raw) To: Maurizio Lombardi, Yi Zhang Cc: open list:NVM EXPRESS DRIVER, Shin'ichiro Kawasaki, Hannes Reinecke On 29/05/2024 12:08, Maurizio Lombardi wrote: > út 28. 5. 2024 v 11:44 odesílatel Maurizio Lombardi > <mlombard@redhat.com> napsal: >> This patch fixes the problem for TCP, it should also work for loop but >> I've not tested it. >> >> http://bsdbackstore.eu/misc/0001-nvme-fix-memory-leak-when-nvme_init_ctrl-fails.patch > I updated it to fix all the fabrics, with the exception of apple's driver > because I am not sure I fully understand its probe process. It'd be better if we didn't propagate this issue to the transport drivers. Seems that the asymmetric part is the device_private allocated in device_add but only removed in device_release (last reference). A hack like this would also make the issue go away: -- diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index f62fd49c1411..8a9d51e1ccd6 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -4702,6 +4702,14 @@ int nvme_init_ctrl(struct nvme_ctrl *ctrl, struct device *dev, nvme_fault_inject_fini(&ctrl->fault_inject); dev_pm_qos_hide_latency_tolerance(ctrl->device); cdev_device_del(&ctrl->cdev, ctrl->device); + /* + * this is nasty hack, but device_add allocated a device private + * dev->p, which is freed assymmetricly in device_release, and + * in order to cleanup after ourselves, and not rely on the device + * .release handler (which also calls the nvme transport free handler) + * we explicitly free the device private. + */ + kfree(ctrl->device->p); out_free_name: nvme_put_ctrl(ctrl); kfree_const(ctrl->device->kobj.name) -- But this is ugly... ^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: blktests nvme 041,042 leak memory 2024-05-29 12:51 ` Sagi Grimberg @ 2024-05-29 13:45 ` Maurizio Lombardi 2024-05-29 16:25 ` Keith Busch 1 sibling, 0 replies; 19+ messages in thread From: Maurizio Lombardi @ 2024-05-29 13:45 UTC (permalink / raw) To: Sagi Grimberg Cc: Yi Zhang, open list:NVM EXPRESS DRIVER, Shin'ichiro Kawasaki, Hannes Reinecke st 29. 5. 2024 v 14:51 odesílatel Sagi Grimberg <sagi@grimberg.me> napsal: > > -- > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c > index f62fd49c1411..8a9d51e1ccd6 100644 > --- a/drivers/nvme/host/core.c > +++ b/drivers/nvme/host/core.c > @@ -4702,6 +4702,14 @@ int nvme_init_ctrl(struct nvme_ctrl *ctrl, struct > device *dev, > nvme_fault_inject_fini(&ctrl->fault_inject); > dev_pm_qos_hide_latency_tolerance(ctrl->device); > cdev_device_del(&ctrl->cdev, ctrl->device); > + /* > + * this is nasty hack, but device_add allocated a device private > + * dev->p, which is freed assymmetricly in device_release, and > + * in order to cleanup after ourselves, and not rely on the device > + * .release handler (which also calls the nvme transport free > handler) > + * we explicitly free the device private. > + */ > + kfree(ctrl->device->p); > out_free_name: > nvme_put_ctrl(ctrl); > kfree_const(ctrl->device->kobj.name) > -- > > But this is ugly... > I like the fact that it's just a one-liner in nvme_init_ctrl, but yes... it's really a hack that may stop working if device_add()'s implementation changes. Maurizio ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: blktests nvme 041,042 leak memory 2024-05-29 12:51 ` Sagi Grimberg 2024-05-29 13:45 ` Maurizio Lombardi @ 2024-05-29 16:25 ` Keith Busch 2024-05-29 16:48 ` Keith Busch 2024-05-30 6:59 ` Sagi Grimberg 1 sibling, 2 replies; 19+ messages in thread From: Keith Busch @ 2024-05-29 16:25 UTC (permalink / raw) To: Sagi Grimberg Cc: Maurizio Lombardi, Yi Zhang, open list:NVM EXPRESS DRIVER, Shin'ichiro Kawasaki, Hannes Reinecke On Wed, May 29, 2024 at 03:51:09PM +0300, Sagi Grimberg wrote: > > But this is ugly... I agree. Could we not work around the device model instead? Once we call device_add successfully, we really need to pair that with a device_del. I see two ways we might be able to do this. First suggestion: move the device_add to the end so we don't have to worry about cleaning up if subsequence actions fail. And auth_init_ctrl doesn't appear to have any dependency on the ctrl->device anyway. --- diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index f5d150c62955d..d86db6a193fcb 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -4675,12 +4675,16 @@ int nvme_init_ctrl(struct nvme_ctrl *ctrl, struct device *dev, if (ret) goto out_release_instance; + ret = nvme_auth_init_ctrl(ctrl); + if (ret) + goto out_free_name; + nvme_get_ctrl(ctrl); cdev_init(&ctrl->cdev, &nvme_dev_fops); ctrl->cdev.owner = ops->module; ret = cdev_device_add(&ctrl->cdev, ctrl->device); if (ret) - goto out_free_name; + goto auth_free; /* * Initialize latency tolerance controls. The sysfs files won't @@ -4692,15 +4696,10 @@ int nvme_init_ctrl(struct nvme_ctrl *ctrl, struct device *dev, nvme_fault_inject_init(&ctrl->fault_inject, dev_name(ctrl->device)); nvme_mpath_init_ctrl(ctrl); - ret = nvme_auth_init_ctrl(ctrl); - if (ret) - goto out_free_cdev; return 0; -out_free_cdev: - nvme_fault_inject_fini(&ctrl->fault_inject); - dev_pm_qos_hide_latency_tolerance(ctrl->device); - cdev_device_del(&ctrl->cdev, ctrl->device); +auth_free: + nvme_auth_free(ctrl); out_free_name: nvme_put_ctrl(ctrl); kfree_const(ctrl->device->kobj.name); -- Second suggestion, let the error handling release the last reference. The trick here is to not call the "free_ctrl" callback on "new" controllers, and don't take a ctrl reference until the end. --- diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index f5d150c62955d..72c9693e1df61 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -4601,7 +4601,8 @@ static void nvme_free_ctrl(struct device *dev) mutex_unlock(&nvme_subsystems_lock); } - ctrl->ops->free_ctrl(ctrl); + if (nvme_ctrl_state(ctrl) != NVME_CTRL_NEW) + ctrl->ops->free_ctrl(ctrl); if (subsys) nvme_put_subsystem(subsys); @@ -4675,7 +4676,6 @@ int nvme_init_ctrl(struct nvme_ctrl *ctrl, struct device *dev, if (ret) goto out_release_instance; - nvme_get_ctrl(ctrl); cdev_init(&ctrl->cdev, &nvme_dev_fops); ctrl->cdev.owner = ops->module; ret = cdev_device_add(&ctrl->cdev, ctrl->device); @@ -4696,13 +4696,16 @@ int nvme_init_ctrl(struct nvme_ctrl *ctrl, struct device *dev, if (ret) goto out_free_cdev; + nvme_get_ctrl(ctrl); return 0; out_free_cdev: nvme_fault_inject_fini(&ctrl->fault_inject); dev_pm_qos_hide_latency_tolerance(ctrl->device); cdev_device_del(&ctrl->cdev, ctrl->device); -out_free_name: nvme_put_ctrl(ctrl); + return ret; + +out_free_name: kfree_const(ctrl->device->kobj.name); out_release_instance: ida_free(&nvme_instance_ida, ctrl->instance); -- ^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: blktests nvme 041,042 leak memory 2024-05-29 16:25 ` Keith Busch @ 2024-05-29 16:48 ` Keith Busch 2024-05-30 6:59 ` Sagi Grimberg 1 sibling, 0 replies; 19+ messages in thread From: Keith Busch @ 2024-05-29 16:48 UTC (permalink / raw) To: Sagi Grimberg Cc: Maurizio Lombardi, Yi Zhang, open list:NVM EXPRESS DRIVER, Shin'ichiro Kawasaki, Hannes Reinecke On Wed, May 29, 2024 at 10:25:18AM -0600, Keith Busch wrote: > --- > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c > index f5d150c62955d..72c9693e1df61 100644 > --- a/drivers/nvme/host/core.c > +++ b/drivers/nvme/host/core.c > @@ -4601,7 +4601,8 @@ static void nvme_free_ctrl(struct device *dev) > mutex_unlock(&nvme_subsystems_lock); > } > > - ctrl->ops->free_ctrl(ctrl); > + if (nvme_ctrl_state(ctrl) != NVME_CTRL_NEW) > + ctrl->ops->free_ctrl(ctrl); I think this may be problematic. If nvme_init_ctrl() is successful, there are other things that might fail before CONNECTING state has been reached, and those failures need their free_ctrl callback invoked from the "NEW" state. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: blktests nvme 041,042 leak memory 2024-05-29 16:25 ` Keith Busch 2024-05-29 16:48 ` Keith Busch @ 2024-05-30 6:59 ` Sagi Grimberg 2024-05-30 7:25 ` Maurizio Lombardi 1 sibling, 1 reply; 19+ messages in thread From: Sagi Grimberg @ 2024-05-30 6:59 UTC (permalink / raw) To: Keith Busch Cc: Maurizio Lombardi, Yi Zhang, open list:NVM EXPRESS DRIVER, Shin'ichiro Kawasaki, Hannes Reinecke On 29/05/2024 19:25, Keith Busch wrote: > On Wed, May 29, 2024 at 03:51:09PM +0300, Sagi Grimberg wrote: >> But this is ugly... > I agree. Could we not work around the device model instead? Once we call > device_add successfully, we really need to pair that with a device_del. > > I see two ways we might be able to do this. > > First suggestion: move the device_add to the end so we don't have to > worry about cleaning up if subsequence actions fail. And auth_init_ctrl > doesn't appear to have any dependency on the ctrl->device anyway. > > --- > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c > index f5d150c62955d..d86db6a193fcb 100644 > --- a/drivers/nvme/host/core.c > +++ b/drivers/nvme/host/core.c > @@ -4675,12 +4675,16 @@ int nvme_init_ctrl(struct nvme_ctrl *ctrl, struct device *dev, > if (ret) > goto out_release_instance; > > + ret = nvme_auth_init_ctrl(ctrl); > + if (ret) > + goto out_free_name; > + > nvme_get_ctrl(ctrl); > cdev_init(&ctrl->cdev, &nvme_dev_fops); > ctrl->cdev.owner = ops->module; > ret = cdev_device_add(&ctrl->cdev, ctrl->device); > if (ret) > - goto out_free_name; > + goto auth_free; > > /* > * Initialize latency tolerance controls. The sysfs files won't > @@ -4692,15 +4696,10 @@ int nvme_init_ctrl(struct nvme_ctrl *ctrl, struct device *dev, > > nvme_fault_inject_init(&ctrl->fault_inject, dev_name(ctrl->device)); > nvme_mpath_init_ctrl(ctrl); > - ret = nvme_auth_init_ctrl(ctrl); > - if (ret) > - goto out_free_cdev; > > return 0; > -out_free_cdev: > - nvme_fault_inject_fini(&ctrl->fault_inject); > - dev_pm_qos_hide_latency_tolerance(ctrl->device); > - cdev_device_del(&ctrl->cdev, ctrl->device); > +auth_free: > + nvme_auth_free(ctrl); > out_free_name: > nvme_put_ctrl(ctrl); > kfree_const(ctrl->device->kobj.name); > -- Lets just do that... and add a comment that we don't want to add function calls that can fail post cdev_device_add() because we have a cleanup problem (with a short explanation), so this is less likely to come back... ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: blktests nvme 041,042 leak memory 2024-05-30 6:59 ` Sagi Grimberg @ 2024-05-30 7:25 ` Maurizio Lombardi 2024-05-30 16:29 ` Keith Busch 0 siblings, 1 reply; 19+ messages in thread From: Maurizio Lombardi @ 2024-05-30 7:25 UTC (permalink / raw) To: Sagi Grimberg Cc: Keith Busch, Yi Zhang, open list:NVM EXPRESS DRIVER, Shin'ichiro Kawasaki, Hannes Reinecke čt 30. 5. 2024 v 8:59 odesílatel Sagi Grimberg <sagi@grimberg.me> napsal: > > > Lets just do that... and add a comment that we don't want to add > function calls that can fail post cdev_device_add() because we have a > cleanup problem (with a short explanation), so this is less likely to > come back... > Solution 1 will work even if it's not 100% perfect because after calling device_initialize() you're supposed to use put_device() to release the instance and not free the dev directly. (The nvme_put_ctrl() call in the error path just releases the additional reference taken by nvme_get_ctrl() just before the call to cdev_device_add()). Maurizio ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: blktests nvme 041,042 leak memory 2024-05-30 7:25 ` Maurizio Lombardi @ 2024-05-30 16:29 ` Keith Busch 2024-05-30 17:40 ` Maurizio Lombardi 0 siblings, 1 reply; 19+ messages in thread From: Keith Busch @ 2024-05-30 16:29 UTC (permalink / raw) To: Maurizio Lombardi Cc: Sagi Grimberg, Yi Zhang, open list:NVM EXPRESS DRIVER, Shin'ichiro Kawasaki, Hannes Reinecke On Thu, May 30, 2024 at 09:25:24AM +0200, Maurizio Lombardi wrote: > ct 30. 5. 2024 v 8:59 odesílatel Sagi Grimberg <sagi@grimberg.me> napsal: > > > > > > Lets just do that... and add a comment that we don't want to add > > function calls that can fail post cdev_device_add() because we have a > > cleanup problem (with a short explanation), so this is less likely to > > come back... > > > > Solution 1 will work even if it's not 100% perfect because after > calling device_initialize() > you're supposed to use put_device() to release the instance and not > free the dev directly. > > (The nvme_put_ctrl() call in the error path just releases the > additional reference taken > by nvme_get_ctrl() just before the call to cdev_device_add()). I noticed that too, if cdev_dev_add() fails we are supposed to do a put_device. This driver seems to be making it difficult to use the core device interfaces as intended. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: blktests nvme 041,042 leak memory 2024-05-30 16:29 ` Keith Busch @ 2024-05-30 17:40 ` Maurizio Lombardi 2024-05-31 11:55 ` Maurizio Lombardi 0 siblings, 1 reply; 19+ messages in thread From: Maurizio Lombardi @ 2024-05-30 17:40 UTC (permalink / raw) To: Keith Busch Cc: Sagi Grimberg, Yi Zhang, open list:NVM EXPRESS DRIVER, Shin'ichiro Kawasaki, Hannes Reinecke čt 30. 5. 2024 v 18:29 odesílatel Keith Busch <kbusch@kernel.org> napsal: > I noticed that too, if cdev_dev_add() fails we are supposed to do a > put_device. This driver seems to be making it difficult to use the core > device interfaces as intended. > Just an idea: maybe split nvme_init_ctrl() in two parts? One with the pre-device_initialize() stuff, I don't know, call it nvme_alloc_ctrl(); and the remaining part will be nvme_init_ctrl(). Maurizio ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: blktests nvme 041,042 leak memory 2024-05-30 17:40 ` Maurizio Lombardi @ 2024-05-31 11:55 ` Maurizio Lombardi 2024-05-31 14:37 ` Maurizio Lombardi 0 siblings, 1 reply; 19+ messages in thread From: Maurizio Lombardi @ 2024-05-31 11:55 UTC (permalink / raw) To: Keith Busch Cc: Sagi Grimberg, Yi Zhang, open list:NVM EXPRESS DRIVER, Shin'ichiro Kawasaki, Hannes Reinecke Maybe a better idea: move device_initialize() to the very beginning of nvme_init_ctrl() initialize ctrl->instance to a negative value, so we can check in the .release() method if we have to call ida_free() or not. Now all the fabrics drivers know that in case of error they have to execute nvme_put_ctrl(). Could it work? Maurizio ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: blktests nvme 041,042 leak memory 2024-05-31 11:55 ` Maurizio Lombardi @ 2024-05-31 14:37 ` Maurizio Lombardi 2024-06-03 8:37 ` Hannes Reinecke 0 siblings, 1 reply; 19+ messages in thread From: Maurizio Lombardi @ 2024-05-31 14:37 UTC (permalink / raw) To: Maurizio Lombardi Cc: Keith Busch, Sagi Grimberg, Yi Zhang, open list:NVM EXPRESS DRIVER, Shin'ichiro Kawasaki, Hannes Reinecke V Fri, May 31, 2024 at 01:55:03PM +0200, Maurizio Lombardi napsal(a): > Maybe a better idea: > > move device_initialize() to the very beginning of nvme_init_ctrl() > initialize ctrl->instance to a negative value, so we can check in the > .release() method if we have to call ida_free() or not. > This is how it would look like, I did some base testing with TCP diff --git a/drivers/nvme/host/auth.c b/drivers/nvme/host/auth.c index 371e14f0a203..f809a4fc23f9 100644 --- a/drivers/nvme/host/auth.c +++ b/drivers/nvme/host/auth.c @@ -886,7 +886,7 @@ int nvme_auth_wait(struct nvme_ctrl *ctrl, int qid) } EXPORT_SYMBOL_GPL(nvme_auth_wait); -static void nvme_ctrl_auth_work(struct work_struct *work) +void nvme_ctrl_auth_work(struct work_struct *work) { struct nvme_ctrl *ctrl = container_of(work, struct nvme_ctrl, dhchap_auth_work); @@ -934,14 +934,13 @@ static void nvme_ctrl_auth_work(struct work_struct *work) "qid %d: authentication failed\n", q); } } +EXPORT_SYMBOL_GPL(nvme_ctrl_auth_work); int nvme_auth_init_ctrl(struct nvme_ctrl *ctrl) { struct nvme_dhchap_queue_context *chap; int i, ret; - mutex_init(&ctrl->dhchap_auth_mutex); - INIT_WORK(&ctrl->dhchap_auth_work, nvme_ctrl_auth_work); if (!ctrl->opts) return 0; ret = nvme_auth_generate_key(ctrl->opts->dhchap_secret, diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index 954f850f113a..e775c4df9af5 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -4572,14 +4572,16 @@ static void nvme_free_ctrl(struct device *dev) container_of(dev, struct nvme_ctrl, ctrl_device); struct nvme_subsystem *subsys = ctrl->subsys; - if (!subsys || ctrl->instance != subsys->instance) + if (ctrl->instance >= 0 && + (!subsys || ctrl->instance != subsys->instance)) ida_free(&nvme_instance_ida, ctrl->instance); key_put(ctrl->tls_key); nvme_free_cels(ctrl); nvme_mpath_uninit(ctrl); nvme_auth_stop(ctrl); nvme_auth_free(ctrl); - __free_page(ctrl->discard_page); + if (ctrl->discard_page) + __free_page(ctrl->discard_page); free_opal_dev(ctrl->opal_dev); if (subsys) { @@ -4610,6 +4612,7 @@ int nvme_init_ctrl(struct nvme_ctrl *ctrl, struct device *dev, clear_bit(NVME_CTRL_FAILFAST_EXPIRED, &ctrl->flags); spin_lock_init(&ctrl->lock); mutex_init(&ctrl->scan_lock); + mutex_init(&ctrl->dhchap_auth_mutex); INIT_LIST_HEAD(&ctrl->namespaces); xa_init(&ctrl->cels); init_rwsem(&ctrl->namespaces_rwsem); @@ -4621,6 +4624,8 @@ int nvme_init_ctrl(struct nvme_ctrl *ctrl, struct device *dev, INIT_WORK(&ctrl->async_event_work, nvme_async_event_work); INIT_WORK(&ctrl->fw_act_work, nvme_fw_act_work); INIT_WORK(&ctrl->delete_work, nvme_delete_ctrl_work); + INIT_WORK(&ctrl->dhchap_auth_work, nvme_ctrl_auth_work); + init_waitqueue_head(&ctrl->state_wq); INIT_DELAYED_WORK(&ctrl->ka_work, nvme_keep_alive_work); @@ -4631,16 +4636,8 @@ int nvme_init_ctrl(struct nvme_ctrl *ctrl, struct device *dev, BUILD_BUG_ON(NVME_DSM_MAX_RANGES * sizeof(struct nvme_dsm_range) > PAGE_SIZE); - ctrl->discard_page = alloc_page(GFP_KERNEL); - if (!ctrl->discard_page) { - ret = -ENOMEM; - goto out; - } - ret = ida_alloc(&nvme_instance_ida, GFP_KERNEL); - if (ret < 0) - goto out; - ctrl->instance = ret; + ctrl->instance = -1; device_initialize(&ctrl->ctrl_device); ctrl->device = &ctrl->ctrl_device; @@ -4654,16 +4651,28 @@ int nvme_init_ctrl(struct nvme_ctrl *ctrl, struct device *dev, ctrl->device->groups = nvme_dev_attr_groups; ctrl->device->release = nvme_free_ctrl; dev_set_drvdata(ctrl->device, ctrl); + + ret = ida_alloc(&nvme_instance_ida, GFP_KERNEL); + if (ret < 0) + goto out; + + ctrl->instance = ret; ret = dev_set_name(ctrl->device, "nvme%d", ctrl->instance); if (ret) - goto out_release_instance; + goto out; + + ctrl->discard_page = alloc_page(GFP_KERNEL); + if (!ctrl->discard_page) { + ret = -ENOMEM; + goto out; + } nvme_get_ctrl(ctrl); cdev_init(&ctrl->cdev, &nvme_dev_fops); ctrl->cdev.owner = ops->module; ret = cdev_device_add(&ctrl->cdev, ctrl->device); if (ret) - goto out_free_name; + goto out_put_ctrl; /* * Initialize latency tolerance controls. The sysfs files won't @@ -4684,14 +4693,9 @@ int nvme_init_ctrl(struct nvme_ctrl *ctrl, struct device *dev, nvme_fault_inject_fini(&ctrl->fault_inject); dev_pm_qos_hide_latency_tolerance(ctrl->device); cdev_device_del(&ctrl->cdev, ctrl->device); -out_free_name: +out_put_ctrl: nvme_put_ctrl(ctrl); - kfree_const(ctrl->device->kobj.name); -out_release_instance: - ida_free(&nvme_instance_ida, ctrl->instance); out: - if (ctrl->discard_page) - __free_page(ctrl->discard_page); return ret; } EXPORT_SYMBOL_GPL(nvme_init_ctrl); diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h index cacc56f4bbf4..b0e378e6ca63 100644 --- a/drivers/nvme/host/nvme.h +++ b/drivers/nvme/host/nvme.h @@ -1124,6 +1124,7 @@ static inline bool nvme_ctrl_sgl_supported(struct nvme_ctrl *ctrl) #ifdef CONFIG_NVME_HOST_AUTH int __init nvme_init_auth(void); void __exit nvme_exit_auth(void); +void nvme_ctrl_auth_work(struct work_struct *work); int nvme_auth_init_ctrl(struct nvme_ctrl *ctrl); void nvme_auth_stop(struct nvme_ctrl *ctrl); int nvme_auth_negotiate(struct nvme_ctrl *ctrl, int qid); diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c index 51a62b0c645a..bc749bebe134 100644 --- a/drivers/nvme/host/rdma.c +++ b/drivers/nvme/host/rdma.c @@ -2302,7 +2302,7 @@ static struct nvme_ctrl *nvme_rdma_create_ctrl(struct device *dev, ret = nvme_init_ctrl(&ctrl->ctrl, dev, &nvme_rdma_ctrl_ops, 0 /* no quirks, we're perfect! */); if (ret) - goto out_kfree_queues; + goto out_put_ctrl; changed = nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_CONNECTING); WARN_ON_ONCE(!changed); @@ -2322,12 +2322,11 @@ static struct nvme_ctrl *nvme_rdma_create_ctrl(struct device *dev, out_uninit_ctrl: nvme_uninit_ctrl(&ctrl->ctrl); +out_put_ctrl: nvme_put_ctrl(&ctrl->ctrl); if (ret > 0) ret = -EIO; return ERR_PTR(ret); -out_kfree_queues: - kfree(ctrl->queues); out_free_ctrl: kfree(ctrl); return ERR_PTR(ret); diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c index 8b5e4327fe83..9787a4fdbb00 100644 --- a/drivers/nvme/host/tcp.c +++ b/drivers/nvme/host/tcp.c @@ -2759,7 +2759,7 @@ static struct nvme_ctrl *nvme_tcp_create_ctrl(struct device *dev, ret = nvme_init_ctrl(&ctrl->ctrl, dev, &nvme_tcp_ctrl_ops, 0); if (ret) - goto out_kfree_queues; + goto out_put_ctrl; if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_CONNECTING)) { WARN_ON_ONCE(1); @@ -2782,12 +2782,11 @@ static struct nvme_ctrl *nvme_tcp_create_ctrl(struct device *dev, out_uninit_ctrl: nvme_uninit_ctrl(&ctrl->ctrl); +out_put_ctrl: nvme_put_ctrl(&ctrl->ctrl); if (ret > 0) ret = -EIO; return ERR_PTR(ret); -out_kfree_queues: - kfree(ctrl->queues); out_free_ctrl: kfree(ctrl); return ERR_PTR(ret); diff --git a/drivers/nvme/target/loop.c b/drivers/nvme/target/loop.c index e589915ddef8..6d1359915b6c 100644 --- a/drivers/nvme/target/loop.c +++ b/drivers/nvme/target/loop.c @@ -550,10 +550,8 @@ static struct nvme_ctrl *nvme_loop_create_ctrl(struct device *dev, ret = nvme_init_ctrl(&ctrl->ctrl, dev, &nvme_loop_ctrl_ops, 0 /* no quirks, we're perfect! */); - if (ret) { - kfree(ctrl); + if (ret) goto out; - } if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_CONNECTING)) WARN_ON_ONCE(1); @@ -611,8 +609,8 @@ static struct nvme_ctrl *nvme_loop_create_ctrl(struct device *dev, kfree(ctrl->queues); out_uninit_ctrl: nvme_uninit_ctrl(&ctrl->ctrl); - nvme_put_ctrl(&ctrl->ctrl); out: + nvme_put_ctrl(&ctrl->ctrl); if (ret > 0) ret = -EIO; return ERR_PTR(ret); -- 2.43.0 ^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: blktests nvme 041,042 leak memory 2024-05-31 14:37 ` Maurizio Lombardi @ 2024-06-03 8:37 ` Hannes Reinecke 2024-06-03 8:52 ` Maurizio Lombardi 2024-06-03 9:35 ` Maurizio Lombardi 0 siblings, 2 replies; 19+ messages in thread From: Hannes Reinecke @ 2024-06-03 8:37 UTC (permalink / raw) To: Maurizio Lombardi, Maurizio Lombardi Cc: Keith Busch, Sagi Grimberg, Yi Zhang, open list:NVM EXPRESS DRIVER, Shin'ichiro Kawasaki On 5/31/24 16:37, Maurizio Lombardi wrote: > V Fri, May 31, 2024 at 01:55:03PM +0200, Maurizio Lombardi napsal(a): >> Maybe a better idea: >> >> move device_initialize() to the very beginning of nvme_init_ctrl() >> initialize ctrl->instance to a negative value, so we can check in the >> .release() method if we have to call ida_free() or not. >> > > This is how it would look like, I did some base testing with TCP > > diff --git a/drivers/nvme/host/auth.c b/drivers/nvme/host/auth.c > index 371e14f0a203..f809a4fc23f9 100644 > --- a/drivers/nvme/host/auth.c > +++ b/drivers/nvme/host/auth.c > @@ -886,7 +886,7 @@ int nvme_auth_wait(struct nvme_ctrl *ctrl, int qid) > } > EXPORT_SYMBOL_GPL(nvme_auth_wait); > > -static void nvme_ctrl_auth_work(struct work_struct *work) > +void nvme_ctrl_auth_work(struct work_struct *work) > { > struct nvme_ctrl *ctrl = > container_of(work, struct nvme_ctrl, dhchap_auth_work); > @@ -934,14 +934,13 @@ static void nvme_ctrl_auth_work(struct work_struct *work) > "qid %d: authentication failed\n", q); > } > } > +EXPORT_SYMBOL_GPL(nvme_ctrl_auth_work); > > int nvme_auth_init_ctrl(struct nvme_ctrl *ctrl) > { > struct nvme_dhchap_queue_context *chap; > int i, ret; > > - mutex_init(&ctrl->dhchap_auth_mutex); > - INIT_WORK(&ctrl->dhchap_auth_work, nvme_ctrl_auth_work); > if (!ctrl->opts) > return 0; > ret = nvme_auth_generate_key(ctrl->opts->dhchap_secret, > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c > index 954f850f113a..e775c4df9af5 100644 > --- a/drivers/nvme/host/core.c > +++ b/drivers/nvme/host/core.c > @@ -4572,14 +4572,16 @@ static void nvme_free_ctrl(struct device *dev) > container_of(dev, struct nvme_ctrl, ctrl_device); > struct nvme_subsystem *subsys = ctrl->subsys; > > - if (!subsys || ctrl->instance != subsys->instance) > + if (ctrl->instance >= 0 && > + (!subsys || ctrl->instance != subsys->instance)) > ida_free(&nvme_instance_ida, ctrl->instance); > key_put(ctrl->tls_key); > nvme_free_cels(ctrl); > nvme_mpath_uninit(ctrl); > nvme_auth_stop(ctrl); > nvme_auth_free(ctrl); > - __free_page(ctrl->discard_page); > + if (ctrl->discard_page) > + __free_page(ctrl->discard_page); > free_opal_dev(ctrl->opal_dev); > > if (subsys) { > @@ -4610,6 +4612,7 @@ int nvme_init_ctrl(struct nvme_ctrl *ctrl, struct device *dev, > clear_bit(NVME_CTRL_FAILFAST_EXPIRED, &ctrl->flags); > spin_lock_init(&ctrl->lock); > mutex_init(&ctrl->scan_lock); > + mutex_init(&ctrl->dhchap_auth_mutex); > INIT_LIST_HEAD(&ctrl->namespaces); > xa_init(&ctrl->cels); > init_rwsem(&ctrl->namespaces_rwsem); > @@ -4621,6 +4624,8 @@ int nvme_init_ctrl(struct nvme_ctrl *ctrl, struct device *dev, > INIT_WORK(&ctrl->async_event_work, nvme_async_event_work); > INIT_WORK(&ctrl->fw_act_work, nvme_fw_act_work); > INIT_WORK(&ctrl->delete_work, nvme_delete_ctrl_work); > + INIT_WORK(&ctrl->dhchap_auth_work, nvme_ctrl_auth_work); > + > init_waitqueue_head(&ctrl->state_wq); > > INIT_DELAYED_WORK(&ctrl->ka_work, nvme_keep_alive_work); > @@ -4631,16 +4636,8 @@ int nvme_init_ctrl(struct nvme_ctrl *ctrl, struct device *dev, > > BUILD_BUG_ON(NVME_DSM_MAX_RANGES * sizeof(struct nvme_dsm_range) > > PAGE_SIZE); > - ctrl->discard_page = alloc_page(GFP_KERNEL); > - if (!ctrl->discard_page) { > - ret = -ENOMEM; > - goto out; > - } > > - ret = ida_alloc(&nvme_instance_ida, GFP_KERNEL); > - if (ret < 0) > - goto out; > - ctrl->instance = ret; > + ctrl->instance = -1; > > device_initialize(&ctrl->ctrl_device); > ctrl->device = &ctrl->ctrl_device; > @@ -4654,16 +4651,28 @@ int nvme_init_ctrl(struct nvme_ctrl *ctrl, struct device *dev, > ctrl->device->groups = nvme_dev_attr_groups; > ctrl->device->release = nvme_free_ctrl; > dev_set_drvdata(ctrl->device, ctrl); > + > + ret = ida_alloc(&nvme_instance_ida, GFP_KERNEL); > + if (ret < 0) > + goto out; > + > + ctrl->instance = ret; > ret = dev_set_name(ctrl->device, "nvme%d", ctrl->instance); > if (ret) > - goto out_release_instance; > + goto out; > + > + ctrl->discard_page = alloc_page(GFP_KERNEL); > + if (!ctrl->discard_page) { > + ret = -ENOMEM; > + goto out; > + } > > nvme_get_ctrl(ctrl); > cdev_init(&ctrl->cdev, &nvme_dev_fops); > ctrl->cdev.owner = ops->module; > ret = cdev_device_add(&ctrl->cdev, ctrl->device); > if (ret) > - goto out_free_name; > + goto out_put_ctrl; > > /* > * Initialize latency tolerance controls. The sysfs files won't > @@ -4684,14 +4693,9 @@ int nvme_init_ctrl(struct nvme_ctrl *ctrl, struct device *dev, > nvme_fault_inject_fini(&ctrl->fault_inject); > dev_pm_qos_hide_latency_tolerance(ctrl->device); > cdev_device_del(&ctrl->cdev, ctrl->device); > -out_free_name: > +out_put_ctrl: > nvme_put_ctrl(ctrl); > - kfree_const(ctrl->device->kobj.name); > -out_release_instance: > - ida_free(&nvme_instance_ida, ctrl->instance); > out: > - if (ctrl->discard_page) > - __free_page(ctrl->discard_page); > return ret; > } > EXPORT_SYMBOL_GPL(nvme_init_ctrl); > diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h > index cacc56f4bbf4..b0e378e6ca63 100644 > --- a/drivers/nvme/host/nvme.h > +++ b/drivers/nvme/host/nvme.h > @@ -1124,6 +1124,7 @@ static inline bool nvme_ctrl_sgl_supported(struct nvme_ctrl *ctrl) > #ifdef CONFIG_NVME_HOST_AUTH > int __init nvme_init_auth(void); > void __exit nvme_exit_auth(void); > +void nvme_ctrl_auth_work(struct work_struct *work); > int nvme_auth_init_ctrl(struct nvme_ctrl *ctrl); > void nvme_auth_stop(struct nvme_ctrl *ctrl); > int nvme_auth_negotiate(struct nvme_ctrl *ctrl, int qid); > diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c > index 51a62b0c645a..bc749bebe134 100644 > --- a/drivers/nvme/host/rdma.c > +++ b/drivers/nvme/host/rdma.c > @@ -2302,7 +2302,7 @@ static struct nvme_ctrl *nvme_rdma_create_ctrl(struct device *dev, > ret = nvme_init_ctrl(&ctrl->ctrl, dev, &nvme_rdma_ctrl_ops, > 0 /* no quirks, we're perfect! */); > if (ret) > - goto out_kfree_queues; > + goto out_put_ctrl; > > changed = nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_CONNECTING); > WARN_ON_ONCE(!changed); > @@ -2322,12 +2322,11 @@ static struct nvme_ctrl *nvme_rdma_create_ctrl(struct device *dev, > > out_uninit_ctrl: > nvme_uninit_ctrl(&ctrl->ctrl); > +out_put_ctrl: > nvme_put_ctrl(&ctrl->ctrl); > if (ret > 0) > ret = -EIO; > return ERR_PTR(ret); > -out_kfree_queues: > - kfree(ctrl->queues); > out_free_ctrl: > kfree(ctrl); > return ERR_PTR(ret); > diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c > index 8b5e4327fe83..9787a4fdbb00 100644 > --- a/drivers/nvme/host/tcp.c > +++ b/drivers/nvme/host/tcp.c > @@ -2759,7 +2759,7 @@ static struct nvme_ctrl *nvme_tcp_create_ctrl(struct device *dev, > > ret = nvme_init_ctrl(&ctrl->ctrl, dev, &nvme_tcp_ctrl_ops, 0); > if (ret) > - goto out_kfree_queues; > + goto out_put_ctrl; > > if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_CONNECTING)) { > WARN_ON_ONCE(1); > @@ -2782,12 +2782,11 @@ static struct nvme_ctrl *nvme_tcp_create_ctrl(struct device *dev, > > out_uninit_ctrl: > nvme_uninit_ctrl(&ctrl->ctrl); > +out_put_ctrl: > nvme_put_ctrl(&ctrl->ctrl); > if (ret > 0) > ret = -EIO; > return ERR_PTR(ret); > -out_kfree_queues: > - kfree(ctrl->queues); > out_free_ctrl: > kfree(ctrl); > return ERR_PTR(ret); > diff --git a/drivers/nvme/target/loop.c b/drivers/nvme/target/loop.c > index e589915ddef8..6d1359915b6c 100644 > --- a/drivers/nvme/target/loop.c > +++ b/drivers/nvme/target/loop.c > @@ -550,10 +550,8 @@ static struct nvme_ctrl *nvme_loop_create_ctrl(struct device *dev, > > ret = nvme_init_ctrl(&ctrl->ctrl, dev, &nvme_loop_ctrl_ops, > 0 /* no quirks, we're perfect! */); > - if (ret) { > - kfree(ctrl); > + if (ret) > goto out; > - } > > if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_CONNECTING)) > WARN_ON_ONCE(1); > @@ -611,8 +609,8 @@ static struct nvme_ctrl *nvme_loop_create_ctrl(struct device *dev, > kfree(ctrl->queues); > out_uninit_ctrl: > nvme_uninit_ctrl(&ctrl->ctrl); > - nvme_put_ctrl(&ctrl->ctrl); > out: > + nvme_put_ctrl(&ctrl->ctrl); > if (ret > 0) > ret = -EIO; > return ERR_PTR(ret); > -- > 2.43.0 > Hmm. Don't really like it. We took great pains of moving then entire DH-CHAP code into its own file, and this patch reverts it and is causing the DH-CHAP code to be littered all over the place. Cheers, Hannes ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: blktests nvme 041,042 leak memory 2024-06-03 8:37 ` Hannes Reinecke @ 2024-06-03 8:52 ` Maurizio Lombardi 2024-06-03 9:33 ` Hannes Reinecke 2024-06-03 9:35 ` Maurizio Lombardi 1 sibling, 1 reply; 19+ messages in thread From: Maurizio Lombardi @ 2024-06-03 8:52 UTC (permalink / raw) To: Hannes Reinecke Cc: Maurizio Lombardi, Keith Busch, Sagi Grimberg, Yi Zhang, open list:NVM EXPRESS DRIVER, Shin'ichiro Kawasaki po 3. 6. 2024 v 10:37 odesílatel Hannes Reinecke <hare@suse.de> napsal: > > On 5/31/24 16:37, Maurizio Lombardi wrote: > Hmm. Don't really like it. > We took great pains of moving then entire DH-CHAP code into its own > file, and this patch reverts it and is causing the DH-CHAP code to be > littered all over the place. This was just an example, I'm not asking to merge it. I just had to find a way to make it possible to call nvme_auth_stop() even when nvme_auth_init() has never been called without crashing the kernel, moving INIT_WORK was the easiest way. Maurizio ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: blktests nvme 041,042 leak memory 2024-06-03 8:52 ` Maurizio Lombardi @ 2024-06-03 9:33 ` Hannes Reinecke 2024-06-03 9:38 ` Maurizio Lombardi 0 siblings, 1 reply; 19+ messages in thread From: Hannes Reinecke @ 2024-06-03 9:33 UTC (permalink / raw) To: Maurizio Lombardi Cc: Maurizio Lombardi, Keith Busch, Sagi Grimberg, Yi Zhang, open list:NVM EXPRESS DRIVER, Shin'ichiro Kawasaki On 6/3/24 10:52, Maurizio Lombardi wrote: > po 3. 6. 2024 v 10:37 odesílatel Hannes Reinecke <hare@suse.de> napsal: >> >> On 5/31/24 16:37, Maurizio Lombardi wrote: >> Hmm. Don't really like it. >> We took great pains of moving then entire DH-CHAP code into its own >> file, and this patch reverts it and is causing the DH-CHAP code to be >> littered all over the place. > > This was just an example, I'm not asking to merge it. > > I just had to find a way to make it possible to call nvme_auth_stop() even > when nvme_auth_init() has never been called without crashing the kernel, > moving INIT_WORK was the easiest way. > Like this? diff --git a/drivers/nvme/host/auth.c b/drivers/nvme/host/auth.c index a264b3ae078b..668cc4d891d7 100644 --- a/drivers/nvme/host/auth.c +++ b/drivers/nvme/host/auth.c @@ -983,7 +983,8 @@ EXPORT_SYMBOL_GPL(nvme_auth_init_ctrl); void nvme_auth_stop(struct nvme_ctrl *ctrl) { - cancel_work_sync(&ctrl->dhchap_auth_work); + if (ctrl->dhchap_ctxs) + cancel_work_sync(&ctrl->dhchap_auth_work); } EXPORT_SYMBOL_GPL(nvme_auth_stop); Cheers, Hannes ^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: blktests nvme 041,042 leak memory 2024-06-03 9:33 ` Hannes Reinecke @ 2024-06-03 9:38 ` Maurizio Lombardi 0 siblings, 0 replies; 19+ messages in thread From: Maurizio Lombardi @ 2024-06-03 9:38 UTC (permalink / raw) To: Hannes Reinecke Cc: Maurizio Lombardi, Keith Busch, Sagi Grimberg, Yi Zhang, open list:NVM EXPRESS DRIVER, Shin'ichiro Kawasaki po 3. 6. 2024 v 11:33 odesílatel Hannes Reinecke <hare@suse.de> napsal: > Like this? > > diff --git a/drivers/nvme/host/auth.c b/drivers/nvme/host/auth.c > index a264b3ae078b..668cc4d891d7 100644 > --- a/drivers/nvme/host/auth.c > +++ b/drivers/nvme/host/auth.c > @@ -983,7 +983,8 @@ EXPORT_SYMBOL_GPL(nvme_auth_init_ctrl); > > void nvme_auth_stop(struct nvme_ctrl *ctrl) > { > - cancel_work_sync(&ctrl->dhchap_auth_work); > + if (ctrl->dhchap_ctxs) > + cancel_work_sync(&ctrl->dhchap_auth_work); > } > EXPORT_SYMBOL_GPL(nvme_auth_stop); > Yes, I was not sure if it was safe to use dhchap_ctxs because nvme_auth_init_ctrl() could return success while leaving a NULL dhchap_ctxs pointer. Maurizio ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: blktests nvme 041,042 leak memory 2024-06-03 8:37 ` Hannes Reinecke 2024-06-03 8:52 ` Maurizio Lombardi @ 2024-06-03 9:35 ` Maurizio Lombardi 1 sibling, 0 replies; 19+ messages in thread From: Maurizio Lombardi @ 2024-06-03 9:35 UTC (permalink / raw) To: Hannes Reinecke Cc: Maurizio Lombardi, Keith Busch, Sagi Grimberg, Yi Zhang, open list:NVM EXPRESS DRIVER, Shin'ichiro Kawasaki V Mon, Jun 03, 2024 at 10:37:33AM +0200, Hannes Reinecke napsal(a): > Hmm. Don't really like it. > We took great pains of moving then entire DH-CHAP code into its own > file, and this patch reverts it and is causing the DH-CHAP code to be > littered all over the place. The following would do the same thing without littering the auth code around. But the real problem of this approach is that, in case of failure, the nvme_init_ctrl() function is unable to cleanly revert all of its changes and cleaning it up is left to the callers (they have to execute nvme_put_ctrl()). diff --git a/drivers/nvme/host/auth.c b/drivers/nvme/host/auth.c index 371e14f0a203..df0c8211b381 100644 --- a/drivers/nvme/host/auth.c +++ b/drivers/nvme/host/auth.c @@ -940,6 +940,8 @@ int nvme_auth_init_ctrl(struct nvme_ctrl *ctrl) struct nvme_dhchap_queue_context *chap; int i, ret; + ctrl->auth_initialized = true; + mutex_init(&ctrl->dhchap_auth_mutex); INIT_WORK(&ctrl->dhchap_auth_work, nvme_ctrl_auth_work); if (!ctrl->opts) @@ -947,7 +949,7 @@ int nvme_auth_init_ctrl(struct nvme_ctrl *ctrl) ret = nvme_auth_generate_key(ctrl->opts->dhchap_secret, &ctrl->host_key); if (ret) - return ret; + goto out; ret = nvme_auth_generate_key(ctrl->opts->dhchap_ctrl_secret, &ctrl->ctrl_key); if (ret) @@ -977,13 +979,16 @@ int nvme_auth_init_ctrl(struct nvme_ctrl *ctrl) err_free_dhchap_secret: nvme_auth_free_key(ctrl->host_key); ctrl->host_key = NULL; +out: + ctrl->auth_initialized = false; return ret; } EXPORT_SYMBOL_GPL(nvme_auth_init_ctrl); void nvme_auth_stop(struct nvme_ctrl *ctrl) { - cancel_work_sync(&ctrl->dhchap_auth_work); + if (ctrl->auth_initialized) + cancel_work_sync(&ctrl->dhchap_auth_work); } EXPORT_SYMBOL_GPL(nvme_auth_stop); diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index 954f850f113a..68b9f2ce98c8 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -4572,14 +4572,16 @@ static void nvme_free_ctrl(struct device *dev) container_of(dev, struct nvme_ctrl, ctrl_device); struct nvme_subsystem *subsys = ctrl->subsys; - if (!subsys || ctrl->instance != subsys->instance) + if (ctrl->instance >= 0 && + (!subsys || ctrl->instance != subsys->instance)) ida_free(&nvme_instance_ida, ctrl->instance); key_put(ctrl->tls_key); nvme_free_cels(ctrl); nvme_mpath_uninit(ctrl); nvme_auth_stop(ctrl); nvme_auth_free(ctrl); - __free_page(ctrl->discard_page); + if (ctrl->discard_page) + __free_page(ctrl->discard_page); free_opal_dev(ctrl->opal_dev); if (subsys) { @@ -4631,16 +4633,8 @@ int nvme_init_ctrl(struct nvme_ctrl *ctrl, struct device *dev, BUILD_BUG_ON(NVME_DSM_MAX_RANGES * sizeof(struct nvme_dsm_range) > PAGE_SIZE); - ctrl->discard_page = alloc_page(GFP_KERNEL); - if (!ctrl->discard_page) { - ret = -ENOMEM; - goto out; - } - ret = ida_alloc(&nvme_instance_ida, GFP_KERNEL); - if (ret < 0) - goto out; - ctrl->instance = ret; + ctrl->instance = -1; device_initialize(&ctrl->ctrl_device); ctrl->device = &ctrl->ctrl_device; @@ -4654,16 +4648,28 @@ int nvme_init_ctrl(struct nvme_ctrl *ctrl, struct device *dev, ctrl->device->groups = nvme_dev_attr_groups; ctrl->device->release = nvme_free_ctrl; dev_set_drvdata(ctrl->device, ctrl); + + ret = ida_alloc(&nvme_instance_ida, GFP_KERNEL); + if (ret < 0) + goto out; + + ctrl->instance = ret; ret = dev_set_name(ctrl->device, "nvme%d", ctrl->instance); if (ret) - goto out_release_instance; + goto out; + + ctrl->discard_page = alloc_page(GFP_KERNEL); + if (!ctrl->discard_page) { + ret = -ENOMEM; + goto out; + } nvme_get_ctrl(ctrl); cdev_init(&ctrl->cdev, &nvme_dev_fops); ctrl->cdev.owner = ops->module; ret = cdev_device_add(&ctrl->cdev, ctrl->device); if (ret) - goto out_free_name; + goto out_put_ctrl; /* * Initialize latency tolerance controls. The sysfs files won't @@ -4684,14 +4690,9 @@ int nvme_init_ctrl(struct nvme_ctrl *ctrl, struct device *dev, nvme_fault_inject_fini(&ctrl->fault_inject); dev_pm_qos_hide_latency_tolerance(ctrl->device); cdev_device_del(&ctrl->cdev, ctrl->device); -out_free_name: +out_put_ctrl: nvme_put_ctrl(ctrl); - kfree_const(ctrl->device->kobj.name); -out_release_instance: - ida_free(&nvme_instance_ida, ctrl->instance); out: - if (ctrl->discard_page) - __free_page(ctrl->discard_page); return ret; } EXPORT_SYMBOL_GPL(nvme_init_ctrl); diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h index cacc56f4bbf4..c723c51e244a 100644 --- a/drivers/nvme/host/nvme.h +++ b/drivers/nvme/host/nvme.h @@ -368,6 +368,7 @@ struct nvme_ctrl { struct nvme_dhchap_key *host_key; struct nvme_dhchap_key *ctrl_key; u16 transaction; + bool auth_initialized; #endif struct key *tls_key; diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c index 51a62b0c645a..bc749bebe134 100644 --- a/drivers/nvme/host/rdma.c +++ b/drivers/nvme/host/rdma.c @@ -2302,7 +2302,7 @@ static struct nvme_ctrl *nvme_rdma_create_ctrl(struct device *dev, ret = nvme_init_ctrl(&ctrl->ctrl, dev, &nvme_rdma_ctrl_ops, 0 /* no quirks, we're perfect! */); if (ret) - goto out_kfree_queues; + goto out_put_ctrl; changed = nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_CONNECTING); WARN_ON_ONCE(!changed); @@ -2322,12 +2322,11 @@ static struct nvme_ctrl *nvme_rdma_create_ctrl(struct device *dev, out_uninit_ctrl: nvme_uninit_ctrl(&ctrl->ctrl); +out_put_ctrl: nvme_put_ctrl(&ctrl->ctrl); if (ret > 0) ret = -EIO; return ERR_PTR(ret); -out_kfree_queues: - kfree(ctrl->queues); out_free_ctrl: kfree(ctrl); return ERR_PTR(ret); diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c index 8b5e4327fe83..9787a4fdbb00 100644 --- a/drivers/nvme/host/tcp.c +++ b/drivers/nvme/host/tcp.c @@ -2759,7 +2759,7 @@ static struct nvme_ctrl *nvme_tcp_create_ctrl(struct device *dev, ret = nvme_init_ctrl(&ctrl->ctrl, dev, &nvme_tcp_ctrl_ops, 0); if (ret) - goto out_kfree_queues; + goto out_put_ctrl; if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_CONNECTING)) { WARN_ON_ONCE(1); @@ -2782,12 +2782,11 @@ static struct nvme_ctrl *nvme_tcp_create_ctrl(struct device *dev, out_uninit_ctrl: nvme_uninit_ctrl(&ctrl->ctrl); +out_put_ctrl: nvme_put_ctrl(&ctrl->ctrl); if (ret > 0) ret = -EIO; return ERR_PTR(ret); -out_kfree_queues: - kfree(ctrl->queues); out_free_ctrl: kfree(ctrl); return ERR_PTR(ret); diff --git a/drivers/nvme/target/loop.c b/drivers/nvme/target/loop.c index e589915ddef8..6d1359915b6c 100644 --- a/drivers/nvme/target/loop.c +++ b/drivers/nvme/target/loop.c @@ -550,10 +550,8 @@ static struct nvme_ctrl *nvme_loop_create_ctrl(struct device *dev, ret = nvme_init_ctrl(&ctrl->ctrl, dev, &nvme_loop_ctrl_ops, 0 /* no quirks, we're perfect! */); - if (ret) { - kfree(ctrl); + if (ret) goto out; - } if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_CONNECTING)) WARN_ON_ONCE(1); @@ -611,8 +609,8 @@ static struct nvme_ctrl *nvme_loop_create_ctrl(struct device *dev, kfree(ctrl->queues); out_uninit_ctrl: nvme_uninit_ctrl(&ctrl->ctrl); - nvme_put_ctrl(&ctrl->ctrl); out: + nvme_put_ctrl(&ctrl->ctrl); if (ret > 0) ret = -EIO; return ERR_PTR(ret); -- 2.43.0 ^ permalink raw reply related [flat|nested] 19+ messages in thread
end of thread, other threads:[~2024-06-03 9:39 UTC | newest] Thread overview: 19+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-05-26 19:47 blktests nvme 041,042 leak memory Sagi Grimberg 2024-05-27 1:11 ` Yi Zhang 2024-05-28 9:44 ` Maurizio Lombardi 2024-05-29 9:08 ` Maurizio Lombardi 2024-05-29 12:51 ` Sagi Grimberg 2024-05-29 13:45 ` Maurizio Lombardi 2024-05-29 16:25 ` Keith Busch 2024-05-29 16:48 ` Keith Busch 2024-05-30 6:59 ` Sagi Grimberg 2024-05-30 7:25 ` Maurizio Lombardi 2024-05-30 16:29 ` Keith Busch 2024-05-30 17:40 ` Maurizio Lombardi 2024-05-31 11:55 ` Maurizio Lombardi 2024-05-31 14:37 ` Maurizio Lombardi 2024-06-03 8:37 ` Hannes Reinecke 2024-06-03 8:52 ` Maurizio Lombardi 2024-06-03 9:33 ` Hannes Reinecke 2024-06-03 9:38 ` Maurizio Lombardi 2024-06-03 9:35 ` Maurizio Lombardi
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox