All of lore.kernel.org
 help / color / mirror / Atom feed
From: Suparna Bhattacharya <suparna@in.ibm.com>
To: "Chen, Kenneth W" <kenneth.w.chen@intel.com>
Cc: linux-kernel@vger.kernel.org, "'Andrew Morton'" <akpm@osdl.org>,
	"'Zach Brown'" <zach.brown@oracle.com>,
	pbadari@gmail.com
Subject: Re: [patch] bug fix in dio handling write error - v2
Date: Sun, 19 Mar 2006 17:24:59 +0530	[thread overview]
Message-ID: <20060319115458.GA29422@in.ibm.com> (raw)
In-Reply-To: <200603190927.k2J9RSg11077@unix-os.sc.intel.com>

On Sun, Mar 19, 2006 at 01:27:33AM -0800, Chen, Kenneth W wrote:
> Referring to original posting:
> http://marc.theaimsgroup.com/?t=113752710100001&r=1&w=2
> 
> Suparna pointed out that this fix has a potential race window.  I think
> the race condition also exists on the READ side currently. The fundamental
> problem is that dio->result is overloaded with dual use: an indicator of
> fall back path for partial dio write, and an error indicator used in the
> I/O completion path.  In the event of device error, the setting of -EIO
> to dio->result clashes with value used to track partial write that activates
> the fall back path.
> 
> One way to fix the race issue is to pull all the code that uses dio->result
> up before I/O is submitted and then never look at dio->result once IO is
> submitted.  Or alternatively another independent variable can be introduce
> to track fall back state, note that to set the tracking logic, kernel still
> have to look at dio->result before I/O is submitted.
> 
> The following patch implements the first option. I would appreciate some
> reviews.  Thanks.
> 
> 
> 
> [patch] bug fix in dio handling write error
> 
> There is a bug in direct-io on propagating write error up to the
> higher I/O layer.  When performing an async ODIRECT write to a
> block device, if a device error occurred (like media error or disk
> is pulled), the error code is only propagated from device driver
> to the DIO layer.  The error code stops at finished_one_bio(). The
> aysnc write, however, is supposedly have a corresponding AIO event
> with appropriate return code (in this case -EIO).  Application
> which waits on the async write event, will hang forever since such
> AIO event is lost forever (if such app did not use the timeout
> option in io_getevents call. Regardless, an AIO event is lost).
> 
> The discovery of above bug leads to another discovery of potential
> race window with dio->result.  The fundamental problem is that
> dio->result is overloaded with dual use: an indicator of fall back
> path for partial dio write, and an error indicator used in the I/O
> completion path.  In the event of device error, the setting of -EIO
> to dio->result clashes with value used to track partial write that
> activates the fall back path.
> 
> Propose to fix the race issue by pulling all the code that uses
> dio->result up before I/O is submitted and then never look at
> dio->result once IO is submitted.
> 
> Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
> 
> --- ./fs/direct-io.c.orig	2006-01-02 19:21:10.000000000 -0800
> +++ ./fs/direct-io.c	2006-03-19 01:57:49.797391064 -0800
> @@ -253,8 +253,7 @@ static void finished_one_bio(struct dio 
>  			dio_complete(dio, offset, transferred);
>  
>  			/* Complete AIO later if falling back to buffered i/o */
> -			if (dio->result == dio->size ||
> -				((dio->rw == READ) && dio->result)) {
> +			if (!dio->waiter) {
>  				aio_complete(dio->iocb, transferred, 0);
>  				kfree(dio);
>  				return;
> @@ -939,7 +938,7 @@ direct_io_worker(int rw, struct kiocb *i
>  	struct dio *dio)
>  {
>  	unsigned long user_addr; 
> -	int seg;
> +	int seg, should_wait = 0;
>  	ssize_t ret = 0;
>  	ssize_t ret2;
>  	size_t bytes;
> @@ -1031,6 +1030,16 @@ direct_io_worker(int rw, struct kiocb *i
>  		}
>  	} /* end iovec loop */
>  
> +	/*
> +	 * We'll have to wait for a partial write to drain before falling back
> +	 * to buffered.  We check this before submitting io so that completion
> +	 * doesn't have a chance to overwrite dio->result with -EIO.
> +	 */
> +	if (dio->result < dio->size && dio->is_async && rw == WRITE) {
> +		dio->waiter = current;
> +		should_wait = 1;
> +	}
> +

Isn't there a possibility that part of the IO for the overall request
may already have been submitted at this point ? (i.e. within
do_direct_IO->submit_page_section ->dio_send_cur_page->dio_bio_submit) 
This is what I was referring to in my earlier response to Zach's patch.

>  	if (ret == -ENOTBLK && rw == WRITE) {
>  		/*
>  		 * The remaining part of the request will be
> @@ -1051,6 +1060,7 @@ direct_io_worker(int rw, struct kiocb *i
>  		page_cache_release(dio->cur_page);
>  		dio->cur_page = NULL;
>  	}
> +	ret2 = dio->result;
>  	if (dio->bio)
>  		dio_bio_submit(dio);
>  
> @@ -1073,14 +1083,8 @@ direct_io_worker(int rw, struct kiocb *i
>  	 * reflect the number of to-be-processed BIOs.
>  	 */
>  	if (dio->is_async) {
> -		int should_wait = 0;
> -
> -		if (dio->result < dio->size && rw == WRITE) {
> -			dio->waiter = current;
> -			should_wait = 1;
> -		}
>  		if (ret == 0)
> -			ret = dio->result;
> +			ret = ret2;
>  		finished_one_bio(dio);		/* This can free the dio */
>  		blk_run_address_space(inode->i_mapping);
>  		if (should_wait) {
> 
> 
> 

Regards
Suparna

-- 
Suparna Bhattacharya (suparna@in.ibm.com)
Linux Technology Center
IBM Software Lab, India


  reply	other threads:[~2006-03-19 11:55 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-03-19  9:27 [patch] bug fix in dio handling write error - v2 Chen, Kenneth W
2006-03-19 11:54 ` Suparna Bhattacharya [this message]
2006-03-19 22:27   ` Chen, Kenneth W
2006-03-20 21:28     ` Zach Brown
2006-03-21  7:06     ` Suparna Bhattacharya

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20060319115458.GA29422@in.ibm.com \
    --to=suparna@in.ibm.com \
    --cc=akpm@osdl.org \
    --cc=kenneth.w.chen@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbadari@gmail.com \
    --cc=zach.brown@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.