From: Niklas Cassel <Niklas.Cassel@wdc.com>
To: "Javier González" <javier@javigon.com>
Cc: "Jens Axboe" <axboe@kernel.dk>,
"Sagi Grimberg" <sagi@grimberg.me>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"linux-nvme@lists.infradead.org" <linux-nvme@lists.infradead.org>,
"Jens Axboe" <axboe@fb.com>, "Keith Busch" <kbusch@kernel.org>,
"Matias Bjørling" <mb@lightnvm.io>,
"Christoph Hellwig" <hch@lst.de>,
"Igor Konopko" <igor.j.konopko@intel.com>
Subject: Re: [PATCH] nvme: prevent double free in nvme_alloc_ns() error handling
Date: Mon, 27 Apr 2020 18:22:46 +0000 [thread overview]
Message-ID: <20200427182245.GA547726@localhost.localdomain> (raw)
In-Reply-To: <20200427180311.nssquibbak5ib4oo@mpHalley.localdomain>
On Mon, Apr 27, 2020 at 08:03:11PM +0200, Javier González wrote:
> On 27.04.2020 14:34, Niklas Cassel wrote:
> > When jumping to the out_put_disk label, we will call put_disk(), which will
> > trigger a call to disk_release(), which calls blk_put_queue().
> >
> > Later in the cleanup code, we do blk_cleanup_queue(), which will also call
> > blk_put_queue().
> >
> > Putting the queue twice is incorrect, and will generate a KASAN splat.
> >
> > Set the disk->queue pointer to NULL, before calling put_disk(), so that the
> > first call to blk_put_queue() will not free the queue.
> >
> > The second call to blk_put_queue() uses another pointer to the same queue,
> > so this call will still free the queue.
> >
> > Fixes: 85136c010285 ("lightnvm: simplify geometry enumeration")
> > Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
> > ---
> > drivers/nvme/host/core.c | 2 ++
> > 1 file changed, 2 insertions(+)
> >
> > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> > index 91c1bd659947..f2adea96b04c 100644
> > --- a/drivers/nvme/host/core.c
> > +++ b/drivers/nvme/host/core.c
> > @@ -3642,6 +3642,8 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid)
> >
> > return;
> > out_put_disk:
> > + /* prevent double queue cleanup */
> > + ns->disk->queue = NULL;
> > put_disk(ns->disk);
> > out_unlink_ns:
> > mutex_lock(&ctrl->subsys->lock);
> > --
> > 2.25.3
> >
> What about delaying the assignment of ns->disk?
>
> diff --git i/drivers/nvme/host/core.c w/drivers/nvme/host/core.c
> index a4d8c90ee7cc..6da4a9ced945 100644
> --- i/drivers/nvme/host/core.c
> +++ w/drivers/nvme/host/core.c
> @@ -3541,7 +3541,6 @@ static int nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid)
> disk->queue = ns->queue;
> disk->flags = flags;
> memcpy(disk->disk_name, disk_name, DISK_NAME_LEN);
> - ns->disk = disk;
>
> __nvme_revalidate_disk(disk, id);
>
> @@ -3553,6 +3552,8 @@ static int nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid)
> }
> }
>
> + ns->disk = disk;
> +
Hello Javier!
The only case where we jump to the out_put_disk label, is if the
nvme_nvm_register() call failed.
In that case, we want to undo the alloc_disk_node() operation, i.e.,
decrease the refcount.
If we don't set "ns->disk = disk;" before the call to nvme_nvm_register(),
then, if register fails, and we jump to the put_disk(ns->disk) label,
ns->disk will be NULL, so the recount will not be decreased, so I assume
that this memory would then be a memory leak.
I think that the problem is that the block functions are a bit messy.
Most drivers seem to do blk_cleanup_queue() first and then do put_disk(),
but some drivers do it in the opposite way, so I think that we might have
some more use-after-free bugs in some of these drivers that do it in the
opposite way.
Kind regards,
Niklas
> down_write(&ctrl->namespaces_rwsem);
> list_add_tail(&ns->list, &ctrl->namespaces);
> up_write(&ctrl->namespaces_rwsem);
>
>
> Javier
_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
WARNING: multiple messages have this Message-ID (diff)
From: Niklas Cassel <Niklas.Cassel@wdc.com>
To: "Javier González" <javier@javigon.com>
Cc: "Keith Busch" <kbusch@kernel.org>, "Jens Axboe" <axboe@fb.com>,
"Christoph Hellwig" <hch@lst.de>,
"Sagi Grimberg" <sagi@grimberg.me>,
"Igor Konopko" <igor.j.konopko@intel.com>,
"Matias Bjørling" <mb@lightnvm.io>,
"Jens Axboe" <axboe@kernel.dk>,
"linux-nvme@lists.infradead.org" <linux-nvme@lists.infradead.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] nvme: prevent double free in nvme_alloc_ns() error handling
Date: Mon, 27 Apr 2020 18:22:46 +0000 [thread overview]
Message-ID: <20200427182245.GA547726@localhost.localdomain> (raw)
In-Reply-To: <20200427180311.nssquibbak5ib4oo@mpHalley.localdomain>
On Mon, Apr 27, 2020 at 08:03:11PM +0200, Javier González wrote:
> On 27.04.2020 14:34, Niklas Cassel wrote:
> > When jumping to the out_put_disk label, we will call put_disk(), which will
> > trigger a call to disk_release(), which calls blk_put_queue().
> >
> > Later in the cleanup code, we do blk_cleanup_queue(), which will also call
> > blk_put_queue().
> >
> > Putting the queue twice is incorrect, and will generate a KASAN splat.
> >
> > Set the disk->queue pointer to NULL, before calling put_disk(), so that the
> > first call to blk_put_queue() will not free the queue.
> >
> > The second call to blk_put_queue() uses another pointer to the same queue,
> > so this call will still free the queue.
> >
> > Fixes: 85136c010285 ("lightnvm: simplify geometry enumeration")
> > Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
> > ---
> > drivers/nvme/host/core.c | 2 ++
> > 1 file changed, 2 insertions(+)
> >
> > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> > index 91c1bd659947..f2adea96b04c 100644
> > --- a/drivers/nvme/host/core.c
> > +++ b/drivers/nvme/host/core.c
> > @@ -3642,6 +3642,8 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid)
> >
> > return;
> > out_put_disk:
> > + /* prevent double queue cleanup */
> > + ns->disk->queue = NULL;
> > put_disk(ns->disk);
> > out_unlink_ns:
> > mutex_lock(&ctrl->subsys->lock);
> > --
> > 2.25.3
> >
> What about delaying the assignment of ns->disk?
>
> diff --git i/drivers/nvme/host/core.c w/drivers/nvme/host/core.c
> index a4d8c90ee7cc..6da4a9ced945 100644
> --- i/drivers/nvme/host/core.c
> +++ w/drivers/nvme/host/core.c
> @@ -3541,7 +3541,6 @@ static int nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid)
> disk->queue = ns->queue;
> disk->flags = flags;
> memcpy(disk->disk_name, disk_name, DISK_NAME_LEN);
> - ns->disk = disk;
>
> __nvme_revalidate_disk(disk, id);
>
> @@ -3553,6 +3552,8 @@ static int nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid)
> }
> }
>
> + ns->disk = disk;
> +
Hello Javier!
The only case where we jump to the out_put_disk label, is if the
nvme_nvm_register() call failed.
In that case, we want to undo the alloc_disk_node() operation, i.e.,
decrease the refcount.
If we don't set "ns->disk = disk;" before the call to nvme_nvm_register(),
then, if register fails, and we jump to the put_disk(ns->disk) label,
ns->disk will be NULL, so the recount will not be decreased, so I assume
that this memory would then be a memory leak.
I think that the problem is that the block functions are a bit messy.
Most drivers seem to do blk_cleanup_queue() first and then do put_disk(),
but some drivers do it in the opposite way, so I think that we might have
some more use-after-free bugs in some of these drivers that do it in the
opposite way.
Kind regards,
Niklas
> down_write(&ctrl->namespaces_rwsem);
> list_add_tail(&ns->list, &ctrl->namespaces);
> up_write(&ctrl->namespaces_rwsem);
>
>
> Javier
next prev parent reply other threads:[~2020-04-27 18:22 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-04-27 12:34 [PATCH] nvme: prevent double free in nvme_alloc_ns() error handling Niklas Cassel
2020-04-27 12:34 ` Niklas Cassel
2020-04-27 15:06 ` Christoph Hellwig
2020-04-27 15:06 ` Christoph Hellwig
2020-04-27 18:03 ` Javier González
2020-04-27 18:03 ` Javier González
2020-04-27 18:22 ` Niklas Cassel [this message]
2020-04-27 18:22 ` Niklas Cassel
2020-04-28 7:06 ` Javier González
2020-04-28 7:06 ` Javier González
2020-04-28 7:49 ` Niklas Cassel
2020-04-28 7:49 ` Niklas Cassel
2020-04-28 10:32 ` Javier González
2020-04-28 10:32 ` Javier González
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200427182245.GA547726@localhost.localdomain \
--to=niklas.cassel@wdc.com \
--cc=axboe@fb.com \
--cc=axboe@kernel.dk \
--cc=hch@lst.de \
--cc=igor.j.konopko@intel.com \
--cc=javier@javigon.com \
--cc=kbusch@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=mb@lightnvm.io \
--cc=sagi@grimberg.me \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.