public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [patch]cfq-iosched: fix cfq_cic_link() race confition
@ 2011-12-02  8:05 Yasuaki Ishimatsu
  2011-12-02  9:05 ` Jens Axboe
  2011-12-02 13:49 ` Vivek Goyal
  0 siblings, 2 replies; 3+ messages in thread
From: Yasuaki Ishimatsu @ 2011-12-02  8:05 UTC (permalink / raw)
  To: axboe, vgoyal; +Cc: linux-kernel

cfq_cic_link() has race condition. When some processes which shared ioc
issue I/O to same block device simultaneously, cfq_cic_link() returns -EEXIST
sometimes. The race condition might stop I/O by following steps:

step  1: Process A: Issue an I/O to /dev/sda
step  2: Process A: Get an ioc (iocA here) in get_io_context() which does not
		    linked with a cic for the device
step  3: Process A: Get a new cic for the device (cicA here) in
		    cfq_alloc_io_context()

step  4: Process B: Issue an I/O to /dev/sda
step  5: Process B: Get iocA in get_io_context() since process A and B share the
		    same ioc
step  6: Process B: Get a new cic for the device (cicB here) in
		    cfq_alloc_io_context() since iocA has not been linked with a
		    cic for the device yet

step  7: Process A: Link cicA to iocA in cfq_cic_link()
step  8: Process A: Dispatch I/O to driver and finish it

step  9: Process B: Try to link cicB to iocA in cfq_cic_link()
		    But it fails with showing "cfq: cic link failed!" kernel
		    message, since iocA has already linked with cicA at step 7.
step 10: Process B: Wait for finishig I/O in get_request_wait()
		    The function does not wake up, when there is no I/O to the
		    device.

When cfq_cic_link() returns -EEXIST, it means ioc has already linked with cic.
So when cfq_cic_link() return -EEXIST, retry cfq_cic_lookup().

Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
---
 block/cfq-iosched.c |   10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

Index: linux-3.2-rc3-fix/block/cfq-iosched.c
===================================================================
--- linux-3.2-rc3-fix.orig/block/cfq-iosched.c	2011-11-30 18:32:28.000000000 +0900
+++ linux-3.2-rc3-fix/block/cfq-iosched.c	2011-11-30 18:24:02.000000000 +0900
@@ -3184,7 +3184,7 @@
 		}
 	}

-	if (ret)
+	if (ret && ret != -EEXIST)
 		printk(KERN_ERR "cfq: cic link failed!\n");

 	return ret;
@@ -3200,6 +3200,7 @@
 {
 	struct io_context *ioc = NULL;
 	struct cfq_io_context *cic;
+	int ret;

 	might_sleep_if(gfp_mask & __GFP_WAIT);

@@ -3207,6 +3208,7 @@
 	if (!ioc)
 		return NULL;

+retry:
 	cic = cfq_cic_lookup(cfqd, ioc);
 	if (cic)
 		goto out;
@@ -3215,7 +3217,12 @@
 	if (cic == NULL)
 		goto err;

-	if (cfq_cic_link(cfqd, ioc, cic, gfp_mask))
+	ret = cfq_cic_link(cfqd, ioc, cic, gfp_mask);
+	if (ret == -EEXIST) {
+		/* someone has linked cic to ioc already */
+		cfq_cic_free(cic);
+		goto retry;
+	} else if (ret)
 		goto err_free;

 out:


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [patch]cfq-iosched: fix cfq_cic_link() race confition
  2011-12-02  8:05 [patch]cfq-iosched: fix cfq_cic_link() race confition Yasuaki Ishimatsu
@ 2011-12-02  9:05 ` Jens Axboe
  2011-12-02 13:49 ` Vivek Goyal
  1 sibling, 0 replies; 3+ messages in thread
From: Jens Axboe @ 2011-12-02  9:05 UTC (permalink / raw)
  To: Yasuaki Ishimatsu; +Cc: vgoyal, linux-kernel

On 2011-12-02 09:05, Yasuaki Ishimatsu wrote:
> cfq_cic_link() has race condition. When some processes which shared ioc
> issue I/O to same block device simultaneously, cfq_cic_link() returns -EEXIST
> sometimes. The race condition might stop I/O by following steps:
> 
> step  1: Process A: Issue an I/O to /dev/sda
> step  2: Process A: Get an ioc (iocA here) in get_io_context() which does not
> 		    linked with a cic for the device
> step  3: Process A: Get a new cic for the device (cicA here) in
> 		    cfq_alloc_io_context()
> 
> step  4: Process B: Issue an I/O to /dev/sda
> step  5: Process B: Get iocA in get_io_context() since process A and B share the
> 		    same ioc
> step  6: Process B: Get a new cic for the device (cicB here) in
> 		    cfq_alloc_io_context() since iocA has not been linked with a
> 		    cic for the device yet
> 
> step  7: Process A: Link cicA to iocA in cfq_cic_link()
> step  8: Process A: Dispatch I/O to driver and finish it
> 
> step  9: Process B: Try to link cicB to iocA in cfq_cic_link()
> 		    But it fails with showing "cfq: cic link failed!" kernel
> 		    message, since iocA has already linked with cicA at step 7.
> step 10: Process B: Wait for finishig I/O in get_request_wait()
> 		    The function does not wake up, when there is no I/O to the
> 		    device.
> 
> When cfq_cic_link() returns -EEXIST, it means ioc has already linked with cic.
> So when cfq_cic_link() return -EEXIST, retry cfq_cic_lookup().

Thanks, your analysis and fix looks correct. Good work! Applied.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [patch]cfq-iosched: fix cfq_cic_link() race confition
  2011-12-02  8:05 [patch]cfq-iosched: fix cfq_cic_link() race confition Yasuaki Ishimatsu
  2011-12-02  9:05 ` Jens Axboe
@ 2011-12-02 13:49 ` Vivek Goyal
  1 sibling, 0 replies; 3+ messages in thread
From: Vivek Goyal @ 2011-12-02 13:49 UTC (permalink / raw)
  To: Yasuaki Ishimatsu; +Cc: axboe, linux-kernel

On Fri, Dec 02, 2011 at 05:05:48PM +0900, Yasuaki Ishimatsu wrote:
> cfq_cic_link() has race condition. When some processes which shared ioc
> issue I/O to same block device simultaneously, cfq_cic_link() returns -EEXIST
> sometimes. The race condition might stop I/O by following steps:
> 
> step  1: Process A: Issue an I/O to /dev/sda
> step  2: Process A: Get an ioc (iocA here) in get_io_context() which does not
> 		    linked with a cic for the device
> step  3: Process A: Get a new cic for the device (cicA here) in
> 		    cfq_alloc_io_context()
> 
> step  4: Process B: Issue an I/O to /dev/sda
> step  5: Process B: Get iocA in get_io_context() since process A and B share the
> 		    same ioc
> step  6: Process B: Get a new cic for the device (cicB here) in
> 		    cfq_alloc_io_context() since iocA has not been linked with a
> 		    cic for the device yet
> 
> step  7: Process A: Link cicA to iocA in cfq_cic_link()
> step  8: Process A: Dispatch I/O to driver and finish it
> 
> step  9: Process B: Try to link cicB to iocA in cfq_cic_link()
> 		    But it fails with showing "cfq: cic link failed!" kernel
> 		    message, since iocA has already linked with cicA at step 7.
> step 10: Process B: Wait for finishig I/O in get_request_wait()
> 		    The function does not wake up, when there is no I/O to the
> 		    device.
> 
> When cfq_cic_link() returns -EEXIST, it means ioc has already linked with cic.
> So when cfq_cic_link() return -EEXIST, retry cfq_cic_lookup().
> 
> Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>

Looks good to me.  Thanks.

Acked-by: Vivek Goyal <vgoyal@redhat.com>

Vivek


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2011-12-02 13:49 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-12-02  8:05 [patch]cfq-iosched: fix cfq_cic_link() race confition Yasuaki Ishimatsu
2011-12-02  9:05 ` Jens Axboe
2011-12-02 13:49 ` Vivek Goyal

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox