All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ming Lei <ming.lei@redhat.com>
To: Max Gurtovoy <maxg@mellanox.com>
Cc: Sagi Grimberg <sagi@grimberg.me>, Jens Axboe <axboe@kernel.dk>,
	"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
	"linux-nvme@lists.infradead.org" <linux-nvme@lists.infradead.org>
Subject: Re: NVMe induced NULL deref in bt_iter()
Date: Mon, 3 Jul 2017 23:54:44 +0800	[thread overview]
Message-ID: <20170703155443.GC28651@ming.t460p> (raw)
In-Reply-To: <c381e5ab-51b6-494f-db62-09cd4bfdfe39@mellanox.com>

On Mon, Jul 03, 2017 at 03:46:34PM +0300, Max Gurtovoy wrote:
> 
> 
> On 7/3/2017 3:03 PM, Ming Lei wrote:
> > On Mon, Jul 03, 2017 at 01:07:44PM +0300, Sagi Grimberg wrote:
> > > Hi Ming,
> > > 
> > > > Yeah, the above change is correct, for any canceling requests in this
> > > > way we should use blk_mq_quiesce_queue().
> > > 
> > > I still don't understand why should blk_mq_flush_busy_ctxs hit a NULL
> > > deref if we don't touch the tagset...
> > 
> > Looks no one mentioned the steps for reproduction, then it isn't easy
> > to understand the related use case, could anyone share the steps for
> > reproduction?
> 
> Hi Ming,
> I create 500 ns per 1 subsystem (using with CX4 target and C-IB initiator
> but also saw it in CX5 vs. CX5 setup).
> The null deref happens when I remove all configuration in the target (1 port
> 1 subsystem and 500 namespaces and nvmet modules unload) during traffic to 1
> nvme device/ns from the intiator.
> I get Null deref in blk_mq_flush_busy_ctxs function that calls
> sbitmap_for_each_set in the initiator. seems like the "struct sbitmap_word
> *word = &sb->map[i];" is null. It's actually might be not null in the
> beginning of the func and become null during running the while loop there.

So looks it is still a normal release in initiator.

Per my experience, without quiescing queue before
blk_mq_tagset_busy_iter() for canceling requests, request double free
can be caused: one submitted req in .queue_rq can completed in
blk_mq_end_request(), meantime it can be completed in
nvme_cancel_request(). That is why we have to quiescing queue
first before canceling request in this way. Except for NVMe, looks
NBD and mtip32xx need fix too.

This way might cause blk_cleanup_queue() to complete early, then NULL
deref can be triggered in blk_mq_flush_busy_ctxs(). But in my previous
debug in PCI NVMe, this wasn't seen yet.

It should have been verified if the above is true by adding some debug
message inside blk_cleanup_queue(). 


Thanks,
Ming

WARNING: multiple messages have this Message-ID (diff)
From: ming.lei@redhat.com (Ming Lei)
Subject: NVMe induced NULL deref in bt_iter()
Date: Mon, 3 Jul 2017 23:54:44 +0800	[thread overview]
Message-ID: <20170703155443.GC28651@ming.t460p> (raw)
In-Reply-To: <c381e5ab-51b6-494f-db62-09cd4bfdfe39@mellanox.com>

On Mon, Jul 03, 2017@03:46:34PM +0300, Max Gurtovoy wrote:
> 
> 
> On 7/3/2017 3:03 PM, Ming Lei wrote:
> > On Mon, Jul 03, 2017@01:07:44PM +0300, Sagi Grimberg wrote:
> > > Hi Ming,
> > > 
> > > > Yeah, the above change is correct, for any canceling requests in this
> > > > way we should use blk_mq_quiesce_queue().
> > > 
> > > I still don't understand why should blk_mq_flush_busy_ctxs hit a NULL
> > > deref if we don't touch the tagset...
> > 
> > Looks no one mentioned the steps for reproduction, then it isn't easy
> > to understand the related use case, could anyone share the steps for
> > reproduction?
> 
> Hi Ming,
> I create 500 ns per 1 subsystem (using with CX4 target and C-IB initiator
> but also saw it in CX5 vs. CX5 setup).
> The null deref happens when I remove all configuration in the target (1 port
> 1 subsystem and 500 namespaces and nvmet modules unload) during traffic to 1
> nvme device/ns from the intiator.
> I get Null deref in blk_mq_flush_busy_ctxs function that calls
> sbitmap_for_each_set in the initiator. seems like the "struct sbitmap_word
> *word = &sb->map[i];" is null. It's actually might be not null in the
> beginning of the func and become null during running the while loop there.

So looks it is still a normal release in initiator.

Per my experience, without quiescing queue before
blk_mq_tagset_busy_iter() for canceling requests, request double free
can be caused: one submitted req in .queue_rq can completed in
blk_mq_end_request(), meantime it can be completed in
nvme_cancel_request(). That is why we have to quiescing queue
first before canceling request in this way. Except for NVMe, looks
NBD and mtip32xx need fix too.

This way might cause blk_cleanup_queue() to complete early, then NULL
deref can be triggered in blk_mq_flush_busy_ctxs(). But in my previous
debug in PCI NVMe, this wasn't seen yet.

It should have been verified if the above is true by adding some debug
message inside blk_cleanup_queue(). 


Thanks,
Ming

  reply	other threads:[~2017-07-03 15:54 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-06-30 17:26 NVMe induced NULL deref in bt_iter() Jens Axboe
2017-07-02 10:45 ` Max Gurtovoy
2017-07-02 10:45   ` Max Gurtovoy
2017-07-02 11:56   ` Sagi Grimberg
2017-07-02 11:56     ` Sagi Grimberg
2017-07-02 14:37     ` Max Gurtovoy
2017-07-02 14:37       ` Max Gurtovoy
2017-07-02 15:08       ` Sagi Grimberg
2017-07-02 15:08         ` Sagi Grimberg
2017-07-03  9:40     ` Ming Lei
2017-07-03  9:40       ` Ming Lei
2017-07-03 10:07       ` Sagi Grimberg
2017-07-03 10:07         ` Sagi Grimberg
2017-07-03 12:03         ` Ming Lei
2017-07-03 12:03           ` Ming Lei
2017-07-03 12:46           ` Max Gurtovoy
2017-07-03 12:46             ` Max Gurtovoy
2017-07-03 15:54             ` Ming Lei [this message]
2017-07-03 15:54               ` Ming Lei
2017-07-04  6:58               ` Sagi Grimberg
2017-07-04  6:58                 ` Sagi Grimberg
2017-07-04  7:56           ` Sagi Grimberg
2017-07-04  7:56             ` Sagi Grimberg
2017-07-04  8:08             ` Ming Lei
2017-07-04  8:08               ` Ming Lei
2017-07-04  9:14               ` Sagi Grimberg
2017-07-04  9:14                 ` Sagi Grimberg
2017-07-03 16:01   ` Jens Axboe
2017-07-03 16:01     ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170703155443.GC28651@ming.t460p \
    --to=ming.lei@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=maxg@mellanox.com \
    --cc=sagi@grimberg.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.