* [PATCH] nvme: prevent double free in nvme_alloc_ns() error handling
@ 2020-04-27 12:34 Niklas Cassel
2020-04-27 15:06 ` Christoph Hellwig
2020-04-27 18:03 ` Javier González
0 siblings, 2 replies; 7+ messages in thread
From: Niklas Cassel @ 2020-04-27 12:34 UTC (permalink / raw)
To: Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg,
Igor Konopko, Matias Bjørling, Javier González
Cc: Niklas Cassel, Jens Axboe, linux-nvme, linux-kernel
When jumping to the out_put_disk label, we will call put_disk(), which will
trigger a call to disk_release(), which calls blk_put_queue().
Later in the cleanup code, we do blk_cleanup_queue(), which will also call
blk_put_queue().
Putting the queue twice is incorrect, and will generate a KASAN splat.
Set the disk->queue pointer to NULL, before calling put_disk(), so that the
first call to blk_put_queue() will not free the queue.
The second call to blk_put_queue() uses another pointer to the same queue,
so this call will still free the queue.
Fixes: 85136c010285 ("lightnvm: simplify geometry enumeration")
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
---
drivers/nvme/host/core.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 91c1bd659947..f2adea96b04c 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -3642,6 +3642,8 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid)
return;
out_put_disk:
+ /* prevent double queue cleanup */
+ ns->disk->queue = NULL;
put_disk(ns->disk);
out_unlink_ns:
mutex_lock(&ctrl->subsys->lock);
--
2.25.3
^ permalink raw reply related [flat|nested] 7+ messages in thread* Re: [PATCH] nvme: prevent double free in nvme_alloc_ns() error handling 2020-04-27 12:34 [PATCH] nvme: prevent double free in nvme_alloc_ns() error handling Niklas Cassel @ 2020-04-27 15:06 ` Christoph Hellwig 2020-04-27 18:03 ` Javier González 1 sibling, 0 replies; 7+ messages in thread From: Christoph Hellwig @ 2020-04-27 15:06 UTC (permalink / raw) To: Niklas Cassel Cc: Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg, Igor Konopko, Matias Bjørling, Javier González, Jens Axboe, linux-nvme, linux-kernel Applied to nvme-5.7. Not a huge fan of it, though - but I think I have an idea how to sort some of this mess out with better block layer APIs. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] nvme: prevent double free in nvme_alloc_ns() error handling 2020-04-27 12:34 [PATCH] nvme: prevent double free in nvme_alloc_ns() error handling Niklas Cassel 2020-04-27 15:06 ` Christoph Hellwig @ 2020-04-27 18:03 ` Javier González 2020-04-27 18:22 ` Niklas Cassel 1 sibling, 1 reply; 7+ messages in thread From: Javier González @ 2020-04-27 18:03 UTC (permalink / raw) To: Niklas Cassel Cc: Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg, Igor Konopko, Matias Bjørling, Jens Axboe, linux-nvme, linux-kernel On 27.04.2020 14:34, Niklas Cassel wrote: >When jumping to the out_put_disk label, we will call put_disk(), which will >trigger a call to disk_release(), which calls blk_put_queue(). > >Later in the cleanup code, we do blk_cleanup_queue(), which will also call >blk_put_queue(). > >Putting the queue twice is incorrect, and will generate a KASAN splat. > >Set the disk->queue pointer to NULL, before calling put_disk(), so that the >first call to blk_put_queue() will not free the queue. > >The second call to blk_put_queue() uses another pointer to the same queue, >so this call will still free the queue. > >Fixes: 85136c010285 ("lightnvm: simplify geometry enumeration") >Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com> >--- > drivers/nvme/host/core.c | 2 ++ > 1 file changed, 2 insertions(+) > >diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c >index 91c1bd659947..f2adea96b04c 100644 >--- a/drivers/nvme/host/core.c >+++ b/drivers/nvme/host/core.c >@@ -3642,6 +3642,8 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid) > > return; > out_put_disk: >+ /* prevent double queue cleanup */ >+ ns->disk->queue = NULL; > put_disk(ns->disk); > out_unlink_ns: > mutex_lock(&ctrl->subsys->lock); >-- >2.25.3 > What about delaying the assignment of ns->disk? diff --git i/drivers/nvme/host/core.c w/drivers/nvme/host/core.c index a4d8c90ee7cc..6da4a9ced945 100644 --- i/drivers/nvme/host/core.c +++ w/drivers/nvme/host/core.c @@ -3541,7 +3541,6 @@ static int nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid) disk->queue = ns->queue; disk->flags = flags; memcpy(disk->disk_name, disk_name, DISK_NAME_LEN); - ns->disk = disk; __nvme_revalidate_disk(disk, id); @@ -3553,6 +3552,8 @@ static int nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid) } } + ns->disk = disk; + down_write(&ctrl->namespaces_rwsem); list_add_tail(&ns->list, &ctrl->namespaces); up_write(&ctrl->namespaces_rwsem); Javier ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] nvme: prevent double free in nvme_alloc_ns() error handling 2020-04-27 18:03 ` Javier González @ 2020-04-27 18:22 ` Niklas Cassel 2020-04-28 7:06 ` Javier González 0 siblings, 1 reply; 7+ messages in thread From: Niklas Cassel @ 2020-04-27 18:22 UTC (permalink / raw) To: Javier González Cc: Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg, Igor Konopko, Matias Bjørling, Jens Axboe, linux-nvme@lists.infradead.org, linux-kernel@vger.kernel.org On Mon, Apr 27, 2020 at 08:03:11PM +0200, Javier González wrote: > On 27.04.2020 14:34, Niklas Cassel wrote: > > When jumping to the out_put_disk label, we will call put_disk(), which will > > trigger a call to disk_release(), which calls blk_put_queue(). > > > > Later in the cleanup code, we do blk_cleanup_queue(), which will also call > > blk_put_queue(). > > > > Putting the queue twice is incorrect, and will generate a KASAN splat. > > > > Set the disk->queue pointer to NULL, before calling put_disk(), so that the > > first call to blk_put_queue() will not free the queue. > > > > The second call to blk_put_queue() uses another pointer to the same queue, > > so this call will still free the queue. > > > > Fixes: 85136c010285 ("lightnvm: simplify geometry enumeration") > > Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com> > > --- > > drivers/nvme/host/core.c | 2 ++ > > 1 file changed, 2 insertions(+) > > > > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c > > index 91c1bd659947..f2adea96b04c 100644 > > --- a/drivers/nvme/host/core.c > > +++ b/drivers/nvme/host/core.c > > @@ -3642,6 +3642,8 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid) > > > > return; > > out_put_disk: > > + /* prevent double queue cleanup */ > > + ns->disk->queue = NULL; > > put_disk(ns->disk); > > out_unlink_ns: > > mutex_lock(&ctrl->subsys->lock); > > -- > > 2.25.3 > > > What about delaying the assignment of ns->disk? > > diff --git i/drivers/nvme/host/core.c w/drivers/nvme/host/core.c > index a4d8c90ee7cc..6da4a9ced945 100644 > --- i/drivers/nvme/host/core.c > +++ w/drivers/nvme/host/core.c > @@ -3541,7 +3541,6 @@ static int nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid) > disk->queue = ns->queue; > disk->flags = flags; > memcpy(disk->disk_name, disk_name, DISK_NAME_LEN); > - ns->disk = disk; > > __nvme_revalidate_disk(disk, id); > > @@ -3553,6 +3552,8 @@ static int nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid) > } > } > > + ns->disk = disk; > + Hello Javier! The only case where we jump to the out_put_disk label, is if the nvme_nvm_register() call failed. In that case, we want to undo the alloc_disk_node() operation, i.e., decrease the refcount. If we don't set "ns->disk = disk;" before the call to nvme_nvm_register(), then, if register fails, and we jump to the put_disk(ns->disk) label, ns->disk will be NULL, so the recount will not be decreased, so I assume that this memory would then be a memory leak. I think that the problem is that the block functions are a bit messy. Most drivers seem to do blk_cleanup_queue() first and then do put_disk(), but some drivers do it in the opposite way, so I think that we might have some more use-after-free bugs in some of these drivers that do it in the opposite way. Kind regards, Niklas > down_write(&ctrl->namespaces_rwsem); > list_add_tail(&ns->list, &ctrl->namespaces); > up_write(&ctrl->namespaces_rwsem); > > > Javier ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] nvme: prevent double free in nvme_alloc_ns() error handling 2020-04-27 18:22 ` Niklas Cassel @ 2020-04-28 7:06 ` Javier González 2020-04-28 7:49 ` Niklas Cassel 0 siblings, 1 reply; 7+ messages in thread From: Javier González @ 2020-04-28 7:06 UTC (permalink / raw) To: Niklas Cassel Cc: Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg, Igor Konopko, Matias Bjørling, Jens Axboe, linux-nvme@lists.infradead.org, linux-kernel@vger.kernel.org On 27.04.2020 18:22, Niklas Cassel wrote: >On Mon, Apr 27, 2020 at 08:03:11PM +0200, Javier González wrote: >> On 27.04.2020 14:34, Niklas Cassel wrote: >> > When jumping to the out_put_disk label, we will call put_disk(), which will >> > trigger a call to disk_release(), which calls blk_put_queue(). >> > >> > Later in the cleanup code, we do blk_cleanup_queue(), which will also call >> > blk_put_queue(). >> > >> > Putting the queue twice is incorrect, and will generate a KASAN splat. >> > >> > Set the disk->queue pointer to NULL, before calling put_disk(), so that the >> > first call to blk_put_queue() will not free the queue. >> > >> > The second call to blk_put_queue() uses another pointer to the same queue, >> > so this call will still free the queue. >> > >> > Fixes: 85136c010285 ("lightnvm: simplify geometry enumeration") >> > Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com> >> > --- >> > drivers/nvme/host/core.c | 2 ++ >> > 1 file changed, 2 insertions(+) >> > >> > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c >> > index 91c1bd659947..f2adea96b04c 100644 >> > --- a/drivers/nvme/host/core.c >> > +++ b/drivers/nvme/host/core.c >> > @@ -3642,6 +3642,8 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid) >> > >> > return; >> > out_put_disk: >> > + /* prevent double queue cleanup */ >> > + ns->disk->queue = NULL; >> > put_disk(ns->disk); >> > out_unlink_ns: >> > mutex_lock(&ctrl->subsys->lock); >> > -- >> > 2.25.3 >> > >> What about delaying the assignment of ns->disk? >> >> diff --git i/drivers/nvme/host/core.c w/drivers/nvme/host/core.c >> index a4d8c90ee7cc..6da4a9ced945 100644 >> --- i/drivers/nvme/host/core.c >> +++ w/drivers/nvme/host/core.c >> @@ -3541,7 +3541,6 @@ static int nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid) >> disk->queue = ns->queue; >> disk->flags = flags; >> memcpy(disk->disk_name, disk_name, DISK_NAME_LEN); >> - ns->disk = disk; >> >> __nvme_revalidate_disk(disk, id); >> >> @@ -3553,6 +3552,8 @@ static int nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid) >> } >> } >> >> + ns->disk = disk; >> + > >Hello Javier! > > >The only case where we jump to the out_put_disk label, is if the >nvme_nvm_register() call failed. > >In that case, we want to undo the alloc_disk_node() operation, i.e., >decrease the refcount. > >If we don't set "ns->disk = disk;" before the call to nvme_nvm_register(), >then, if register fails, and we jump to the put_disk(ns->disk) label, >ns->disk will be NULL, so the recount will not be decreased, so I assume >that this memory would then be a memory leak. > > >I think that the problem is that the block functions are a bit messy. >Most drivers seem to do blk_cleanup_queue() first and then do put_disk(), >but some drivers do it in the opposite way, so I think that we might have >some more use-after-free bugs in some of these drivers that do it in the >opposite way. > Hi Niklas, Yes, the out_put_disk label was introduced at the same time as the LightNVM entry point. We can do a better job at separating the cleanup functions, but as far as I can see ns->disk is not used in the LightNVM initialization, so delaying the initialization should be ok. Part of this should be also changing the out_put_disk to put_disk(disk). Note that initializing other namespace types here do not require ns->disk either, so delaying initialization should be ok. We have been running with this patch locally for some time. This said, this is an alternative as your fix works. Javier ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] nvme: prevent double free in nvme_alloc_ns() error handling 2020-04-28 7:06 ` Javier González @ 2020-04-28 7:49 ` Niklas Cassel 2020-04-28 10:32 ` Javier González 0 siblings, 1 reply; 7+ messages in thread From: Niklas Cassel @ 2020-04-28 7:49 UTC (permalink / raw) To: Javier González Cc: Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg, Igor Konopko, Matias Bjørling, Jens Axboe, linux-nvme@lists.infradead.org, linux-kernel@vger.kernel.org On Tue, Apr 28, 2020 at 09:06:51AM +0200, Javier González wrote: > CAUTION: This email originated from outside of Western Digital. Do not click on links or open attachments unless you recognize the sender and know that the content is safe. > > > On 27.04.2020 18:22, Niklas Cassel wrote: > > On Mon, Apr 27, 2020 at 08:03:11PM +0200, Javier González wrote: > > > On 27.04.2020 14:34, Niklas Cassel wrote: > > > > When jumping to the out_put_disk label, we will call put_disk(), which will > > > > trigger a call to disk_release(), which calls blk_put_queue(). > > > > > > > > Later in the cleanup code, we do blk_cleanup_queue(), which will also call > > > > blk_put_queue(). > > > > > > > > Putting the queue twice is incorrect, and will generate a KASAN splat. > > > > > > > > Set the disk->queue pointer to NULL, before calling put_disk(), so that the > > > > first call to blk_put_queue() will not free the queue. > > > > > > > > The second call to blk_put_queue() uses another pointer to the same queue, > > > > so this call will still free the queue. > > > > > > > > Fixes: 85136c010285 ("lightnvm: simplify geometry enumeration") > > > > Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com> > > > > --- > > > > drivers/nvme/host/core.c | 2 ++ > > > > 1 file changed, 2 insertions(+) > > > > > > > > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c > > > > index 91c1bd659947..f2adea96b04c 100644 > > > > --- a/drivers/nvme/host/core.c > > > > +++ b/drivers/nvme/host/core.c > > > > @@ -3642,6 +3642,8 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid) > > > > > > > > return; > > > > out_put_disk: > > > > + /* prevent double queue cleanup */ > > > > + ns->disk->queue = NULL; > > > > put_disk(ns->disk); > > > > out_unlink_ns: > > > > mutex_lock(&ctrl->subsys->lock); > > > > -- > > > > 2.25.3 > > > > > > > What about delaying the assignment of ns->disk? > > > > > > diff --git i/drivers/nvme/host/core.c w/drivers/nvme/host/core.c > > > index a4d8c90ee7cc..6da4a9ced945 100644 > > > --- i/drivers/nvme/host/core.c > > > +++ w/drivers/nvme/host/core.c > > > @@ -3541,7 +3541,6 @@ static int nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid) > > > disk->queue = ns->queue; > > > disk->flags = flags; > > > memcpy(disk->disk_name, disk_name, DISK_NAME_LEN); > > > - ns->disk = disk; > > > > > > __nvme_revalidate_disk(disk, id); > > > > > > @@ -3553,6 +3552,8 @@ static int nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid) > > > } > > > } > > > > > > + ns->disk = disk; > > > + > > > > Hello Javier! > > > > > > The only case where we jump to the out_put_disk label, is if the > > nvme_nvm_register() call failed. > > > > In that case, we want to undo the alloc_disk_node() operation, i.e., > > decrease the refcount. > > > > If we don't set "ns->disk = disk;" before the call to nvme_nvm_register(), > > then, if register fails, and we jump to the put_disk(ns->disk) label, > > ns->disk will be NULL, so the recount will not be decreased, so I assume > > that this memory would then be a memory leak. > > > > > > I think that the problem is that the block functions are a bit messy. > > Most drivers seem to do blk_cleanup_queue() first and then do put_disk(), > > but some drivers do it in the opposite way, so I think that we might have > > some more use-after-free bugs in some of these drivers that do it in the > > opposite way. > > > > Hi Niklas, > > Yes, the out_put_disk label was introduced at the same time as the > LightNVM entry point. We can do a better job at separating the cleanup > functions, but as far as I can see ns->disk is not used in the LightNVM > initialization, so delaying the initialization should be ok. Part of > this should be also changing the out_put_disk to put_disk(disk). Hello Javier, If I understand correctly, you are suggesting this: --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -3616,7 +3616,6 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid) disk->queue = ns->queue; disk->flags = flags; memcpy(disk->disk_name, disk_name, DISK_NAME_LEN); - ns->disk = disk; __nvme_revalidate_disk(disk, id); @@ -3628,6 +3627,8 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid) } } + ns->disk = disk; + down_write(&ctrl->namespaces_rwsem); list_add_tail(&ns->list, &ctrl->namespaces); up_write(&ctrl->namespaces_rwsem); @@ -3642,7 +3643,7 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid) return; out_put_disk: - put_disk(ns->disk); + put_disk(disk); out_unlink_ns: mutex_lock(&ctrl->subsys->lock); list_del_rcu(&ns->siblings); That would not solve the double free error when the registration fails, since disk->queue is still set, so this call will still free the queue. Later in the cleanup code blk_cleanup_queue(ns->queue); is called, which will still try to free what is already freed. To remove the double free by delaying assignments, we would need to delay the assignment of disk->queue until after the LightNVM entry point, but I don't think that is possible, since __nvme_revalidate_disk(), which is called before the LightNVM entry point uses disk->queue. Christoph said that he would clean up this mess with better block layer APIs. So perhaps the fix that is already queued is good enough until then. Kind regards, Niklas ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] nvme: prevent double free in nvme_alloc_ns() error handling 2020-04-28 7:49 ` Niklas Cassel @ 2020-04-28 10:32 ` Javier González 0 siblings, 0 replies; 7+ messages in thread From: Javier González @ 2020-04-28 10:32 UTC (permalink / raw) To: Niklas Cassel Cc: Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg, Igor Konopko, Matias Bjørling, Jens Axboe, linux-nvme@lists.infradead.org, linux-kernel@vger.kernel.org On 28.04.2020 07:49, Niklas Cassel wrote: >On Tue, Apr 28, 2020 at 09:06:51AM +0200, Javier González wrote: >> CAUTION: This email originated from outside of Western Digital. Do not click on links or open attachments unless you recognize the sender and know that the content is safe. >> >> >> On 27.04.2020 18:22, Niklas Cassel wrote: >> > On Mon, Apr 27, 2020 at 08:03:11PM +0200, Javier González wrote: >> > > On 27.04.2020 14:34, Niklas Cassel wrote: >> > > > When jumping to the out_put_disk label, we will call put_disk(), which will >> > > > trigger a call to disk_release(), which calls blk_put_queue(). >> > > > >> > > > Later in the cleanup code, we do blk_cleanup_queue(), which will also call >> > > > blk_put_queue(). >> > > > >> > > > Putting the queue twice is incorrect, and will generate a KASAN splat. >> > > > >> > > > Set the disk->queue pointer to NULL, before calling put_disk(), so that the >> > > > first call to blk_put_queue() will not free the queue. >> > > > >> > > > The second call to blk_put_queue() uses another pointer to the same queue, >> > > > so this call will still free the queue. >> > > > >> > > > Fixes: 85136c010285 ("lightnvm: simplify geometry enumeration") >> > > > Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com> >> > > > --- >> > > > drivers/nvme/host/core.c | 2 ++ >> > > > 1 file changed, 2 insertions(+) >> > > > >> > > > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c >> > > > index 91c1bd659947..f2adea96b04c 100644 >> > > > --- a/drivers/nvme/host/core.c >> > > > +++ b/drivers/nvme/host/core.c >> > > > @@ -3642,6 +3642,8 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid) >> > > > >> > > > return; >> > > > out_put_disk: >> > > > + /* prevent double queue cleanup */ >> > > > + ns->disk->queue = NULL; >> > > > put_disk(ns->disk); >> > > > out_unlink_ns: >> > > > mutex_lock(&ctrl->subsys->lock); >> > > > -- >> > > > 2.25.3 >> > > > >> > > What about delaying the assignment of ns->disk? >> > > >> > > diff --git i/drivers/nvme/host/core.c w/drivers/nvme/host/core.c >> > > index a4d8c90ee7cc..6da4a9ced945 100644 >> > > --- i/drivers/nvme/host/core.c >> > > +++ w/drivers/nvme/host/core.c >> > > @@ -3541,7 +3541,6 @@ static int nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid) >> > > disk->queue = ns->queue; >> > > disk->flags = flags; >> > > memcpy(disk->disk_name, disk_name, DISK_NAME_LEN); >> > > - ns->disk = disk; >> > > >> > > __nvme_revalidate_disk(disk, id); >> > > >> > > @@ -3553,6 +3552,8 @@ static int nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid) >> > > } >> > > } >> > > >> > > + ns->disk = disk; >> > > + >> > >> > Hello Javier! >> > >> > >> > The only case where we jump to the out_put_disk label, is if the >> > nvme_nvm_register() call failed. >> > >> > In that case, we want to undo the alloc_disk_node() operation, i.e., >> > decrease the refcount. >> > >> > If we don't set "ns->disk = disk;" before the call to nvme_nvm_register(), >> > then, if register fails, and we jump to the put_disk(ns->disk) label, >> > ns->disk will be NULL, so the recount will not be decreased, so I assume >> > that this memory would then be a memory leak. >> > >> > >> > I think that the problem is that the block functions are a bit messy. >> > Most drivers seem to do blk_cleanup_queue() first and then do put_disk(), >> > but some drivers do it in the opposite way, so I think that we might have >> > some more use-after-free bugs in some of these drivers that do it in the >> > opposite way. >> > >> >> Hi Niklas, >> >> Yes, the out_put_disk label was introduced at the same time as the >> LightNVM entry point. We can do a better job at separating the cleanup >> functions, but as far as I can see ns->disk is not used in the LightNVM >> initialization, so delaying the initialization should be ok. Part of >> this should be also changing the out_put_disk to put_disk(disk). > >Hello Javier, > > >If I understand correctly, you are suggesting this: > >--- a/drivers/nvme/host/core.c >+++ b/drivers/nvme/host/core.c >@@ -3616,7 +3616,6 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid) > disk->queue = ns->queue; > disk->flags = flags; > memcpy(disk->disk_name, disk_name, DISK_NAME_LEN); >- ns->disk = disk; > > __nvme_revalidate_disk(disk, id); > >@@ -3628,6 +3627,8 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid) > } > } > >+ ns->disk = disk; >+ > down_write(&ctrl->namespaces_rwsem); > list_add_tail(&ns->list, &ctrl->namespaces); > up_write(&ctrl->namespaces_rwsem); >@@ -3642,7 +3643,7 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid) > > return; > out_put_disk: >- put_disk(ns->disk); >+ put_disk(disk); > out_unlink_ns: > mutex_lock(&ctrl->subsys->lock); > list_del_rcu(&ns->siblings); > > >That would not solve the double free error when the registration fails, >since disk->queue is still set, so this call will still free the queue. > >Later in the cleanup code blk_cleanup_queue(ns->queue); >is called, which will still try to free what is already freed. > >To remove the double free by delaying assignments, we would need to >delay the assignment of disk->queue until after the LightNVM entry point, >but I don't think that is possible, since __nvme_revalidate_disk(), which >is called before the LightNVM entry point uses disk->queue. > >Christoph said that he would clean up this mess with better block layer >APIs. So perhaps the fix that is already queued is good enough until then. > Makes sense. I see some things have changed in this path since I last looked at it. Thanks! Javier ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2020-04-28 10:32 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2020-04-27 12:34 [PATCH] nvme: prevent double free in nvme_alloc_ns() error handling Niklas Cassel 2020-04-27 15:06 ` Christoph Hellwig 2020-04-27 18:03 ` Javier González 2020-04-27 18:22 ` Niklas Cassel 2020-04-28 7:06 ` Javier González 2020-04-28 7:49 ` Niklas Cassel 2020-04-28 10:32 ` Javier González
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox