From: Suparna Bhattacharya <suparna@in.ibm.com>
To: "Chen, Kenneth W" <kenneth.w.chen@intel.com>
Cc: linux-kernel@vger.kernel.org, "'Andrew Morton'" <akpm@osdl.org>,
"'Zach Brown'" <zach.brown@oracle.com>,
pbadari@gmail.com
Subject: Re: [patch] bug fix in dio handling write error - v2
Date: Sun, 19 Mar 2006 17:24:59 +0530 [thread overview]
Message-ID: <20060319115458.GA29422@in.ibm.com> (raw)
In-Reply-To: <200603190927.k2J9RSg11077@unix-os.sc.intel.com>
On Sun, Mar 19, 2006 at 01:27:33AM -0800, Chen, Kenneth W wrote:
> Referring to original posting:
> http://marc.theaimsgroup.com/?t=113752710100001&r=1&w=2
>
> Suparna pointed out that this fix has a potential race window. I think
> the race condition also exists on the READ side currently. The fundamental
> problem is that dio->result is overloaded with dual use: an indicator of
> fall back path for partial dio write, and an error indicator used in the
> I/O completion path. In the event of device error, the setting of -EIO
> to dio->result clashes with value used to track partial write that activates
> the fall back path.
>
> One way to fix the race issue is to pull all the code that uses dio->result
> up before I/O is submitted and then never look at dio->result once IO is
> submitted. Or alternatively another independent variable can be introduce
> to track fall back state, note that to set the tracking logic, kernel still
> have to look at dio->result before I/O is submitted.
>
> The following patch implements the first option. I would appreciate some
> reviews. Thanks.
>
>
>
> [patch] bug fix in dio handling write error
>
> There is a bug in direct-io on propagating write error up to the
> higher I/O layer. When performing an async ODIRECT write to a
> block device, if a device error occurred (like media error or disk
> is pulled), the error code is only propagated from device driver
> to the DIO layer. The error code stops at finished_one_bio(). The
> aysnc write, however, is supposedly have a corresponding AIO event
> with appropriate return code (in this case -EIO). Application
> which waits on the async write event, will hang forever since such
> AIO event is lost forever (if such app did not use the timeout
> option in io_getevents call. Regardless, an AIO event is lost).
>
> The discovery of above bug leads to another discovery of potential
> race window with dio->result. The fundamental problem is that
> dio->result is overloaded with dual use: an indicator of fall back
> path for partial dio write, and an error indicator used in the I/O
> completion path. In the event of device error, the setting of -EIO
> to dio->result clashes with value used to track partial write that
> activates the fall back path.
>
> Propose to fix the race issue by pulling all the code that uses
> dio->result up before I/O is submitted and then never look at
> dio->result once IO is submitted.
>
> Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
>
> --- ./fs/direct-io.c.orig 2006-01-02 19:21:10.000000000 -0800
> +++ ./fs/direct-io.c 2006-03-19 01:57:49.797391064 -0800
> @@ -253,8 +253,7 @@ static void finished_one_bio(struct dio
> dio_complete(dio, offset, transferred);
>
> /* Complete AIO later if falling back to buffered i/o */
> - if (dio->result == dio->size ||
> - ((dio->rw == READ) && dio->result)) {
> + if (!dio->waiter) {
> aio_complete(dio->iocb, transferred, 0);
> kfree(dio);
> return;
> @@ -939,7 +938,7 @@ direct_io_worker(int rw, struct kiocb *i
> struct dio *dio)
> {
> unsigned long user_addr;
> - int seg;
> + int seg, should_wait = 0;
> ssize_t ret = 0;
> ssize_t ret2;
> size_t bytes;
> @@ -1031,6 +1030,16 @@ direct_io_worker(int rw, struct kiocb *i
> }
> } /* end iovec loop */
>
> + /*
> + * We'll have to wait for a partial write to drain before falling back
> + * to buffered. We check this before submitting io so that completion
> + * doesn't have a chance to overwrite dio->result with -EIO.
> + */
> + if (dio->result < dio->size && dio->is_async && rw == WRITE) {
> + dio->waiter = current;
> + should_wait = 1;
> + }
> +
Isn't there a possibility that part of the IO for the overall request
may already have been submitted at this point ? (i.e. within
do_direct_IO->submit_page_section ->dio_send_cur_page->dio_bio_submit)
This is what I was referring to in my earlier response to Zach's patch.
> if (ret == -ENOTBLK && rw == WRITE) {
> /*
> * The remaining part of the request will be
> @@ -1051,6 +1060,7 @@ direct_io_worker(int rw, struct kiocb *i
> page_cache_release(dio->cur_page);
> dio->cur_page = NULL;
> }
> + ret2 = dio->result;
> if (dio->bio)
> dio_bio_submit(dio);
>
> @@ -1073,14 +1083,8 @@ direct_io_worker(int rw, struct kiocb *i
> * reflect the number of to-be-processed BIOs.
> */
> if (dio->is_async) {
> - int should_wait = 0;
> -
> - if (dio->result < dio->size && rw == WRITE) {
> - dio->waiter = current;
> - should_wait = 1;
> - }
> if (ret == 0)
> - ret = dio->result;
> + ret = ret2;
> finished_one_bio(dio); /* This can free the dio */
> blk_run_address_space(inode->i_mapping);
> if (should_wait) {
>
>
>
Regards
Suparna
--
Suparna Bhattacharya (suparna@in.ibm.com)
Linux Technology Center
IBM Software Lab, India
next prev parent reply other threads:[~2006-03-19 11:55 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-03-19 9:27 [patch] bug fix in dio handling write error - v2 Chen, Kenneth W
2006-03-19 11:54 ` Suparna Bhattacharya [this message]
2006-03-19 22:27 ` Chen, Kenneth W
2006-03-20 21:28 ` Zach Brown
2006-03-21 7:06 ` Suparna Bhattacharya
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20060319115458.GA29422@in.ibm.com \
--to=suparna@in.ibm.com \
--cc=akpm@osdl.org \
--cc=kenneth.w.chen@intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=pbadari@gmail.com \
--cc=zach.brown@oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox