[PATCH] NVMe: use-after-free on surprise removal fixes

All of lore.kernel.org
 help / color / mirror / Atom feed

From: willy@linux.intel.com (Matthew Wilcox)
Subject: [PATCH] NVMe: use-after-free on surprise removal fixes
Date: Thu, 2 Jan 2014 11:58:53 -0500	[thread overview]
Message-ID: <20140102165853.GN4945@linux.intel.com> (raw)
In-Reply-To: <1388528118-4866-1-git-send-email-keith.busch@intel.com>

On Tue, Dec 31, 2013@03:15:18PM -0700, Keith Busch wrote:
> A surprise removal of a device while a program has an open reference to
> the block handle may cause it to use the nvme dev, namespace, or queue
> after we've freed them. Reference count the opens on the block device,
> do not free the namespaces until the device ref count is 0, and protect
> queue access with RCU.

Brave.  I love it.

> diff --git a/drivers/block/nvme-core.c b/drivers/block/nvme-core.c
> index b59a93a..d71fc27 100644
> --- a/drivers/block/nvme-core.c
> +++ b/drivers/block/nvme-core.c
> @@ -263,12 +263,12 @@ static void *cancel_cmdid(struct nvme_queue *nvmeq, int cmdid,
>  
>  struct nvme_queue *get_nvmeq(struct nvme_dev *dev)
>  {
> -	return dev->queues[get_cpu() + 1];
> +	return rcu_dereference(dev->queues[part_stat_lock() + 1]);
>  }
>  
>  void put_nvmeq(struct nvme_queue *nvmeq)
>  {
> -	put_cpu();
> +	part_stat_unlock();
>  }
>  
>  /**

I'm not so keen on using part_stat_lock,unlock here.  If the part stats
switch to a different protection scheme, this will be broken.  Also,
I'm not entirely sure that we're safe in the !SMP case (can't we end up
without having called preempt_disable()?).  I think it's probably better
to do:

-	return dev->queues[get_cpu() + 1];
+	int queue;
+	rcu_read_lock();
+	queue = get_cpu() + 1;
+	return rcu_dereference(dev->queues[queue]);

> @@ -1155,10 +1155,20 @@ static void nvme_free_queues(struct nvme_dev *dev, int lowest)
>  {
>  	int i;
>  
> +	for (i = num_possible_cpus(); i > dev->queue_count - 1; i--)
> +		rcu_assign_pointer(dev->queues[i], NULL);
>  	for (i = dev->queue_count - 1; i >= lowest; i--) {
> -		nvme_free_queue(dev->queues[i]);
> +		struct nvme_queue *nvmeq = dev->queues[i];
> +
> +		spin_lock_irq(&nvmeq->q_lock);
> +		nvme_cancel_ios(nvmeq, false);
> +		spin_unlock_irq(&nvmeq->q_lock);
> +
> +		rcu_assign_pointer(dev->queues[i], NULL);
> +		synchronize_rcu();
> +
> +		nvme_free_queue(nvmeq);
>  		dev->queue_count--;
> -		dev->queues[i] = NULL;
>  	}
>  }
>  

I don't think you want to do this.  You're calling synchronize_rcu() for
each queue individually, which is extremely heavy-weight.  This routine
could take minutes to execute on a machine with a lot of RCU activity.

Two ways to fix this; accumulate the queues into a function-local array,
call synchronize_rcu() once, then free them all.  Alternatively, use
call_rcu() to have nvme_free_queue() executed after a grace period
has elapsed.

If we go that route, then as described in
Documentation/RCU/rcubarrier.txt, we will need to add a call to
rcu_barrier() in the module exit path to be sure that all the call_rcu()
functions have finished executing.  So there's probabaly no real
difference between the execution time of the two options in the case of
module unload, though it'd be visible in device hotplug.

By the way, this is an extremely common mistake people make when
introducing RCU.  If one can figure out a way to avoid calling
synchronize_rcu(), then one probably should.

Also, why is
		rcu_assign_pointer(dev->queues[i], NULL);
after cancelling the I/Os?  It seems to me that another thread could submit
an I/O between the cancellations and setting the queue to NULL.  I think
the pointer assignment needs to at least be within the spin_lock section,
but it could also be done before taking the spinlock ... right?

next prev parent reply	other threads:[~2014-01-02 16:58 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-12-31 22:15 [PATCH] NVMe: use-after-free on surprise removal fixes Keith Busch
2014-01-02 16:58 ` Matthew Wilcox [this message]
2014-01-06 16:12   ` Keith Busch

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140102165853.GN4945@linux.intel.com \
    --to=willy@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.