public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/1] block: CFQ refcounting fix
@ 2005-08-30 22:41 brking
  2005-08-31  7:28 ` Jens Axboe
  0 siblings, 1 reply; 6+ messages in thread
From: brking @ 2005-08-30 22:41 UTC (permalink / raw)
  To: axboe; +Cc: linux-kernel, brking


I ran across a memory leak related to the cfq scheduler. The cfq
init function increments the refcnt of the associated request_queue.
This refcount gets decremented in cfq's exit function. Since blk_cleanup_queue
only calls the elevator exit function when its refcnt goes to zero, the
request_q never gets cleaned up. It didn't look like other io schedulers were
incrementing this refcnt, so I removed the refcnt increment and it fixed the
memory leak for me.

To reproduce the problem, simply use cfq and use the scsi_host scan sysfs
attribute to scan "- - -" repeatedly on a scsi host and watch the memory
vanish.

Signed-off-by: Brian King <brking@us.ibm.com>
---

 linux-2.6-bjking1/drivers/block/cfq-iosched.c |    1 -
 1 files changed, 1 deletion(-)

diff -puN drivers/block/cfq-iosched.c~cfq_refcnt_fix drivers/block/cfq-iosched.c
--- linux-2.6/drivers/block/cfq-iosched.c~cfq_refcnt_fix	2005-08-30 17:26:55.000000000 -0500
+++ linux-2.6-bjking1/drivers/block/cfq-iosched.c	2005-08-30 17:26:55.000000000 -0500
@@ -2318,7 +2318,6 @@ static int cfq_init_queue(request_queue_
 	e->elevator_data = cfqd;
 
 	cfqd->queue = q;
-	atomic_inc(&q->refcnt);
 
 	cfqd->max_queued = q->nr_requests / 4;
 	q->nr_batching = cfq_queued;
_

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/1] block: CFQ refcounting fix
  2005-08-30 22:41 [PATCH 1/1] block: CFQ refcounting fix brking
@ 2005-08-31  7:28 ` Jens Axboe
  2005-08-31 13:40   ` Brian King
  0 siblings, 1 reply; 6+ messages in thread
From: Jens Axboe @ 2005-08-31  7:28 UTC (permalink / raw)
  To: brking; +Cc: linux-kernel

On Tue, Aug 30 2005, brking@us.ibm.com wrote:
> 
> I ran across a memory leak related to the cfq scheduler. The cfq
> init function increments the refcnt of the associated request_queue.
> This refcount gets decremented in cfq's exit function. Since blk_cleanup_queue
> only calls the elevator exit function when its refcnt goes to zero, the
> request_q never gets cleaned up. It didn't look like other io schedulers were
> incrementing this refcnt, so I removed the refcnt increment and it fixed the
> memory leak for me.
> 
> To reproduce the problem, simply use cfq and use the scsi_host scan sysfs
> attribute to scan "- - -" repeatedly on a scsi host and watch the memory
> vanish.

Yeah, that actually looks like a dangling reference. I assume you tested
this properly?

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/1] block: CFQ refcounting fix
  2005-08-31  7:28 ` Jens Axboe
@ 2005-08-31 13:40   ` Brian King
  2005-08-31 13:43     ` Jens Axboe
  0 siblings, 1 reply; 6+ messages in thread
From: Brian King @ 2005-08-31 13:40 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-kernel

Jens Axboe wrote:
> On Tue, Aug 30 2005, brking@us.ibm.com wrote:
> 
>>I ran across a memory leak related to the cfq scheduler. The cfq
>>init function increments the refcnt of the associated request_queue.
>>This refcount gets decremented in cfq's exit function. Since blk_cleanup_queue
>>only calls the elevator exit function when its refcnt goes to zero, the
>>request_q never gets cleaned up. It didn't look like other io schedulers were
>>incrementing this refcnt, so I removed the refcnt increment and it fixed the
>>memory leak for me.
>>
>>To reproduce the problem, simply use cfq and use the scsi_host scan sysfs
>>attribute to scan "- - -" repeatedly on a scsi host and watch the memory
>>vanish.
> 
> 
> Yeah, that actually looks like a dangling reference. I assume you tested
> this properly?

Yes. I applied the patch, booted my system (which was crashing on bootup before
due to out of memory errors due to the leak) ran the scan a few times and verified
/proc/meminfo didn't continually decrease like without it, and rebooted again.
If there is anything else you would like me to do, I would be happy to do so.

Thanks

Brian


-- 
Brian King
eServer Storage I/O
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/1] block: CFQ refcounting fix
  2005-08-31 13:40   ` Brian King
@ 2005-08-31 13:43     ` Jens Axboe
  2005-08-31 13:57       ` Brian King
  0 siblings, 1 reply; 6+ messages in thread
From: Jens Axboe @ 2005-08-31 13:43 UTC (permalink / raw)
  To: Brian King; +Cc: linux-kernel

On Wed, Aug 31 2005, Brian King wrote:
> Jens Axboe wrote:
> > On Tue, Aug 30 2005, brking@us.ibm.com wrote:
> > 
> >>I ran across a memory leak related to the cfq scheduler. The cfq
> >>init function increments the refcnt of the associated request_queue.
> >>This refcount gets decremented in cfq's exit function. Since blk_cleanup_queue
> >>only calls the elevator exit function when its refcnt goes to zero, the
> >>request_q never gets cleaned up. It didn't look like other io schedulers were
> >>incrementing this refcnt, so I removed the refcnt increment and it fixed the
> >>memory leak for me.
> >>
> >>To reproduce the problem, simply use cfq and use the scsi_host scan sysfs
> >>attribute to scan "- - -" repeatedly on a scsi host and watch the memory
> >>vanish.
> > 
> > 
> > Yeah, that actually looks like a dangling reference. I assume you tested
> > this properly?
> 
> Yes. I applied the patch, booted my system (which was crashing on
> bootup before due to out of memory errors due to the leak) ran the
> scan a few times and verified /proc/meminfo didn't continually
> decrease like without it, and rebooted again.  If there is anything
> else you would like me to do, I would be happy to do so.

I think you need to remove the blk_put_queue() in cfq_put_cfqd() as
well, otherwise I don't see how this can work without looking at freed
memory. I'll audit the other paths as well.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/1] block: CFQ refcounting fix
  2005-08-31 13:43     ` Jens Axboe
@ 2005-08-31 13:57       ` Brian King
  2005-08-31 15:50         ` Jens Axboe
  0 siblings, 1 reply; 6+ messages in thread
From: Brian King @ 2005-08-31 13:57 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1538 bytes --]

Jens Axboe wrote:
> On Wed, Aug 31 2005, Brian King wrote:
> 
>>Jens Axboe wrote:
>>
>>>On Tue, Aug 30 2005, brking@us.ibm.com wrote:
>>>
>>>
>>>>I ran across a memory leak related to the cfq scheduler. The cfq
>>>>init function increments the refcnt of the associated request_queue.
>>>>This refcount gets decremented in cfq's exit function. Since blk_cleanup_queue
>>>>only calls the elevator exit function when its refcnt goes to zero, the
>>>>request_q never gets cleaned up. It didn't look like other io schedulers were
>>>>incrementing this refcnt, so I removed the refcnt increment and it fixed the
>>>>memory leak for me.
>>>>
>>>>To reproduce the problem, simply use cfq and use the scsi_host scan sysfs
>>>>attribute to scan "- - -" repeatedly on a scsi host and watch the memory
>>>>vanish.
>>>
>>>
>>>Yeah, that actually looks like a dangling reference. I assume you tested
>>>this properly?
>>
>>Yes. I applied the patch, booted my system (which was crashing on
>>bootup before due to out of memory errors due to the leak) ran the
>>scan a few times and verified /proc/meminfo didn't continually
>>decrease like without it, and rebooted again.  If there is anything
>>else you would like me to do, I would be happy to do so.
> 
> 
> I think you need to remove the blk_put_queue() in cfq_put_cfqd() as
> well, otherwise I don't see how this can work without looking at freed
> memory. I'll audit the other paths as well.

Good catch. Here is an updated patch. 


-- 
Brian King
eServer Storage I/O
IBM Linux Technology Center

[-- Attachment #2: cfq_refcnt_fix.patch --]
[-- Type: text/plain, Size: 1442 bytes --]


I ran across a memory leak related to the cfq scheduler. The cfq
init function increments the refcnt of the associated request_queue.
This refcount gets decremented in cfq's exit function. Since blk_cleanup_queue
only calls the elevator exit function when its refcnt goes to zero, the
request_q never gets cleaned up. It didn't look like other io schedulers were
incrementing this refcnt, so I removed the refcnt increment and it fixed the
memory leak for me.

To reproduce the problem, simply use cfq and use the scsi_host scan sysfs
attribute to scan "- - -" repeatedly on a scsi host and watch the memory
vanish.

Signed-off-by: Brian King <brking@us.ibm.com>
---

 linux-2.6-bjking1/drivers/block/cfq-iosched.c |    3 ---
 1 files changed, 3 deletions(-)

diff -puN drivers/block/cfq-iosched.c~cfq_refcnt_fix drivers/block/cfq-iosched.c
--- linux-2.6/drivers/block/cfq-iosched.c~cfq_refcnt_fix	2005-08-30 17:26:55.000000000 -0500
+++ linux-2.6-bjking1/drivers/block/cfq-iosched.c	2005-08-31 08:48:30.000000000 -0500
@@ -2260,8 +2260,6 @@ static void cfq_put_cfqd(struct cfq_data
 	if (!atomic_dec_and_test(&cfqd->ref))
 		return;
 
-	blk_put_queue(q);
-
 	cfq_shutdown_timer_wq(cfqd);
 	q->elevator->elevator_data = NULL;
 
@@ -2318,7 +2316,6 @@ static int cfq_init_queue(request_queue_
 	e->elevator_data = cfqd;
 
 	cfqd->queue = q;
-	atomic_inc(&q->refcnt);
 
 	cfqd->max_queued = q->nr_requests / 4;
 	q->nr_batching = cfq_queued;
_

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/1] block: CFQ refcounting fix
  2005-08-31 13:57       ` Brian King
@ 2005-08-31 15:50         ` Jens Axboe
  0 siblings, 0 replies; 6+ messages in thread
From: Jens Axboe @ 2005-08-31 15:50 UTC (permalink / raw)
  To: Brian King; +Cc: linux-kernel

On Wed, Aug 31 2005, Brian King wrote:
> diff -puN drivers/block/cfq-iosched.c~cfq_refcnt_fix drivers/block/cfq-iosched.c
> --- linux-2.6/drivers/block/cfq-iosched.c~cfq_refcnt_fix	2005-08-30 17:26:55.000000000 -0500
> +++ linux-2.6-bjking1/drivers/block/cfq-iosched.c	2005-08-31 08:48:30.000000000 -0500
> @@ -2260,8 +2260,6 @@ static void cfq_put_cfqd(struct cfq_data
>  	if (!atomic_dec_and_test(&cfqd->ref))
>  		return;
>  
> -	blk_put_queue(q);
> -
>  	cfq_shutdown_timer_wq(cfqd);
>  	q->elevator->elevator_data = NULL;
>  
> @@ -2318,7 +2316,6 @@ static int cfq_init_queue(request_queue_
>  	e->elevator_data = cfqd;
>  
>  	cfqd->queue = q;
> -	atomic_inc(&q->refcnt);
>  
>  	cfqd->max_queued = q->nr_requests / 4;
>  	q->nr_batching = cfq_queued;
> _

That looks better. I'll add this to my outgoing queue, thanks!

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2005-08-31 15:50 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-08-30 22:41 [PATCH 1/1] block: CFQ refcounting fix brking
2005-08-31  7:28 ` Jens Axboe
2005-08-31 13:40   ` Brian King
2005-08-31 13:43     ` Jens Axboe
2005-08-31 13:57       ` Brian King
2005-08-31 15:50         ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox