linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] cfq-iosched: fix ioc_data leak
@ 2008-04-04  1:01 Fabio Checconi
  2008-04-04 11:46 ` Jens Axboe
  0 siblings, 1 reply; 2+ messages in thread
From: Fabio Checconi @ 2008-04-04  1:01 UTC (permalink / raw)
  To: axboe; +Cc: linux-kernel

Hi,
    stress testing module loading and unloading on a derived scheduler, I
had crashes on cfq caused by what seemed to be an error in the caching of
cic lookup results in the ioc_data field of io contexts.

As what's happening is a little bit involved (at least for me), I've put
together some of the collected oopses with and without debug patches, the
patches themselves, a script used to reproduce the problem and the .config
used on kvm/qemu here, hoping that they can explain the problem better than
words:

    http://feanor.sssup.it/~fabio/linux/cfq-ioc-data/

The patch below should fix the problem.


Subject: cfq-iosched: do not leak ioc_data across iosched switches

When switching scheduler from cfq, cfq_exit_queue() does not clear
ioc->ioc_data, leaving a dangling pointer that can deceive the following
lookups when the iosched is switched back to cfq.  The pattern that can
trigger that is the following:

    - elevator switch from cfq to something else;
    - module unloading, with elv_unregister() that calls cfq_free_io_context()
      on ioc freeing the cic (via the .trim op);
    - module gets reloaded and the elevator switches back to cfq;
    - reallocation of a cic at the same address as before (with a valid key).

To fix it just assign NULL to ioc_data in __cfq_exit_single_io_context(),
that is called from the regular exit path and from the elevator switching
code.  The only path that frees a cic and is not covered is the error handling
one, but cic's freed in this way are never cached in ioc_data.

Signed-off-by: Fabio Checconi <fabio@gandalf.sssup.it>
---
diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index 0f962ec..67cd023 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -1207,6 +1207,8 @@ static void cfq_exit_cfqq(struct cfq_data *cfqd, struct cfq_queue *cfqq)
 static void __cfq_exit_single_io_context(struct cfq_data *cfqd,
 					 struct cfq_io_context *cic)
 {
+	struct io_context *ioc = cic->ioc;
+
 	list_del_init(&cic->queue_list);
 
 	/*
@@ -1216,6 +1218,9 @@ static void __cfq_exit_single_io_context(struct cfq_data *cfqd,
 	cic->dead_key = (unsigned long) cic->key;
 	cic->key = NULL;
 
+	if (ioc->ioc_data == cic)
+		rcu_assign_pointer(ioc->ioc_data, NULL);
+
 	if (cic->cfqq[ASYNC]) {
 		cfq_exit_cfqq(cfqd, cic->cfqq[ASYNC]);
 		cic->cfqq[ASYNC] = NULL;
@@ -1248,8 +1253,7 @@ static void cfq_exit_single_io_context(struct io_context *ioc,
  */
 static void cfq_exit_io_context(struct io_context *ioc)
 {
-	rcu_assign_pointer(ioc->ioc_data, NULL);
 	call_for_each_cic(ioc, cfq_exit_single_io_context);
 }
 
 static struct cfq_io_context *
@@ -1480,8 +1485,7 @@ cfq_drop_dead_cic(struct cfq_data *cfqd, struct io_context *ioc,
 
 	spin_lock_irqsave(&ioc->lock, flags);
 
-	if (ioc->ioc_data == cic)
-		rcu_assign_pointer(ioc->ioc_data, NULL);
+	BUG_ON(ioc->ioc_data == cic);
 
 	radix_tree_delete(&ioc->radix_root, (unsigned long) cfqd);
 	hlist_del_rcu(&cic->cic_list);


^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: [PATCH] cfq-iosched: fix ioc_data leak
  2008-04-04  1:01 [PATCH] cfq-iosched: fix ioc_data leak Fabio Checconi
@ 2008-04-04 11:46 ` Jens Axboe
  0 siblings, 0 replies; 2+ messages in thread
From: Jens Axboe @ 2008-04-04 11:46 UTC (permalink / raw)
  To: linux-kernel; +Cc: fchecconi

On Fri, Apr 04 2008, Fabio Checconi wrote:
> Hi,
>     stress testing module loading and unloading on a derived scheduler, I
> had crashes on cfq caused by what seemed to be an error in the caching of
> cic lookup results in the ioc_data field of io contexts.
> 
> As what's happening is a little bit involved (at least for me), I've put
> together some of the collected oopses with and without debug patches, the
> patches themselves, a script used to reproduce the problem and the .config
> used on kvm/qemu here, hoping that they can explain the problem better than
> words:
> 
>     http://feanor.sssup.it/~fabio/linux/cfq-ioc-data/
> 
> The patch below should fix the problem.
> 
> 
> Subject: cfq-iosched: do not leak ioc_data across iosched switches
> 
> When switching scheduler from cfq, cfq_exit_queue() does not clear
> ioc->ioc_data, leaving a dangling pointer that can deceive the following
> lookups when the iosched is switched back to cfq.  The pattern that can
> trigger that is the following:
> 
>     - elevator switch from cfq to something else;
>     - module unloading, with elv_unregister() that calls cfq_free_io_context()
>       on ioc freeing the cic (via the .trim op);
>     - module gets reloaded and the elevator switches back to cfq;
>     - reallocation of a cic at the same address as before (with a valid key).
> 
> To fix it just assign NULL to ioc_data in __cfq_exit_single_io_context(),
> that is called from the regular exit path and from the elevator switching
> code.  The only path that frees a cic and is not covered is the error handling
> one, but cic's freed in this way are never cached in ioc_data.
> 
> Signed-off-by: Fabio Checconi <fabio@gandalf.sssup.it>
> ---
> diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
> index 0f962ec..67cd023 100644
> --- a/block/cfq-iosched.c
> +++ b/block/cfq-iosched.c
> @@ -1207,6 +1207,8 @@ static void cfq_exit_cfqq(struct cfq_data *cfqd, struct cfq_queue *cfqq)
>  static void __cfq_exit_single_io_context(struct cfq_data *cfqd,
>  					 struct cfq_io_context *cic)
>  {
> +	struct io_context *ioc = cic->ioc;
> +
>  	list_del_init(&cic->queue_list);
>  
>  	/*
> @@ -1216,6 +1218,9 @@ static void __cfq_exit_single_io_context(struct cfq_data *cfqd,
>  	cic->dead_key = (unsigned long) cic->key;
>  	cic->key = NULL;
>  
> +	if (ioc->ioc_data == cic)
> +		rcu_assign_pointer(ioc->ioc_data, NULL);
> +
>  	if (cic->cfqq[ASYNC]) {
>  		cfq_exit_cfqq(cfqd, cic->cfqq[ASYNC]);
>  		cic->cfqq[ASYNC] = NULL;
> @@ -1248,8 +1253,7 @@ static void cfq_exit_single_io_context(struct io_context *ioc,
>   */
>  static void cfq_exit_io_context(struct io_context *ioc)
>  {
> -	rcu_assign_pointer(ioc->ioc_data, NULL);
>  	call_for_each_cic(ioc, cfq_exit_single_io_context);
>  }
>  
>  static struct cfq_io_context *
> @@ -1480,8 +1485,7 @@ cfq_drop_dead_cic(struct cfq_data *cfqd, struct io_context *ioc,
>  
>  	spin_lock_irqsave(&ioc->lock, flags);
>  
> -	if (ioc->ioc_data == cic)
> -		rcu_assign_pointer(ioc->ioc_data, NULL);
> +	BUG_ON(ioc->ioc_data == cic);
>  
>  	radix_tree_delete(&ioc->radix_root, (unsigned long) cfqd);
>  	hlist_del_rcu(&cic->cic_list);

Your analysis and fix looks correct, thanks a lot!

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2008-04-04 11:49 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-04-04  1:01 [PATCH] cfq-iosched: fix ioc_data leak Fabio Checconi
2008-04-04 11:46 ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).