All of lore.kernel.org
 help / color / mirror / Atom feed
From: Junxiao Bi <junxiao.bi@oracle.com>
To: Jeff Moyer <jmoyer@redhat.com>
Cc: ocfs2-devel@oss.oracle.com, linux-aio@kvack.org,
	mfasheh@suse.com, jlbec@evilplan.org, bcrl@kvack.org,
	linux-kernel@vger.kernel.org, stable@vger.kernel.org,
	joe.jin@oracle.com
Subject: [Ocfs2-devel] [PATCH 1/2] aio: make kiocb->private NUll in init_sync_kiocb()
Date: Sat, 02 Jun 2012 10:59:05 +0800	[thread overview]
Message-ID: <4FC98179.4000400@oracle.com> (raw)
In-Reply-To: <x49r4ty69bw.fsf@segfault.boston.devel.redhat.com>

On 06/02/2012 04:55 AM, Jeff Moyer wrote:
> Junxiao Bi<junxiao.bi@oracle.com>  writes:
>
>> On 05/31/2012 10:08 PM, Jeff Moyer wrote:
>>> Junxiao Bi<junxiao.bi@oracle.com>  writes:
>>>
>>>> Ocfs2 uses kiocb.*private as a flag of unsigned long size. In
>>>> commit a11f7e6 ocfs2: serialize unaligned aio, the unaligned
>>>> io flag is involved in it to serialize the unaligned aio. As
>>>> *private is not initialized in init_sync_kiocb() of do_sync_write(),
>>>> this unaligned io flag may be unexpectly set in an aligned dio.
>>>> And this will cause OCFS2_I(inode)->ip_unaligned_aio decreased
>>>> to -1 in ocfs2_dio_end_io(), thus the following unaligned dio
>>>> will hang forever at ocfs2_aiodio_wait() in ocfs2_file_write_iter().
>>>> We can't initialized this flag in ocfs2_file_write_iter() since
>>>> it may be invoked several times by do_sync_write(). So we initialize
>>>> it in init_sync_kiocb(), it's also useful for other similiar use of
>>>> it in the future.
>>> I don't see any ocfs2_file_write_iter in the upstream kernel.
>>> ocfs2_file_aio_write most certainly could set ->private to 0, it
>>> will only be called once for a given kiocb.
>>  From sys_io_submit->..->io_submit_one->aio_run_iocb->aio_rw_vect_retry,
>> it seems that aio_write could be called two times. See the following
>> scenario.
>> 1. There is a file opened with direct io flag, in aio_rw_vect_retry,
>> aio_write is called first time. If the direct io can
>> not be completed, it will fall back into buffer io, see line 2329 in
>> aio_write.
> Huh?  What's line 2329 in aio_write?
See the following code.

2312         can_do_direct = direct_io;
2313         ret = ocfs2_prepare_inode_for_write(file, ppos,
2314                                             iocb->ki_left, appending,
2315 &can_do_direct, &has_refcount);
2316         if (ret < 0) {
2317                 mlog_errno(ret);
2318                 goto out;
2319         }
2320
2321         if (direct_io && !is_sync_kiocb(iocb))
2322                 unaligned_dio = ocfs2_is_io_unaligned(inode, 
iocb->ki_left,
2323                                                       *ppos);
2324
2325         /*
2326          * We can't complete the direct I/O as requested, fall back to
2327          * buffered I/O.
2328          */
2329         if (direct_io && !can_do_direct) {
2330                 ocfs2_rw_unlock(inode, rw_level);
2331
2332                 have_alloc_sem = 0;
2333                 rw_level = -1;
2334
2335                 direct_io = 0;
2336                 goto relock;
2337         }

The above is the source code how direct io falled back to buffer io. In 
line 2313, in function ocfs2_prepare_inode_for_write(), it will judge 
whether the direct io can be executed. If not, the variable 
"can_do_direct" will be set to false, then the variable "direct_io" will 
be set to 0 in line 2335. This means that generic_file_buffered_write() 
will be called in the following code, not generic_file_direct_write(), 
see the following code. So if the generic_file_buffered_write() is a 
partial write, then its return value "written" will be made as the 
return value of the aio_write, see line 2439. Then it return back to 
aio_rw_vect_retry(), the condition (ret > 0 && iocb->ki_left > 0 && 
opcode == IOCB_CMD_PWRITEV) is true. Then aio_write will be called 
second time. As the unaligned I/O flag may be set in the kiocb at the 
first time call of aio_write, it may affect the second call of aio_write 
if its direct IO is aligned.

2372         if (direct_io) {
2373                 written = generic_file_direct_write(iocb, iov, 
&nr_segs, *ppos,
2374                                                     ppos, count, 
ocount);
2375                 if (written < 0) {
2376                         ret = written;
2377                         goto out_dio;
2378                 }
2379         } else {
2380                 current->backing_dev_info = 
file->f_mapping->backing_dev_info;
2381                 written = generic_file_buffered_write(iocb, iov, 
nr_segs, *ppos,
2382                                                       ppos, count, 0);
2383                 current->backing_dev_info = NULL;
2384         }

2438         if (written)
2439                 ret = written;
2440         return ret;
>
>> 2. If the very buffer io is a partial write, then it will return back
>> to  aio_rw_vect_retry and issue the second aio_write.
> For the generic case, the fallback to buffered I/O happens in
> __generic_file_aio_write, without bouncing all the way back up the call
> stack to aio_rw_vect_retry.  I see in ocfs2, things are a bit different:
>
> retry->aio_rw_vect_retry->ocfs2_file_aio_write->generic_file_direct_write
>    ->ocfs2_direct_IO->__blockdev_direct_IO
>
> That last function can return 0 if not all of the data was written via
> direct I/O.  At that point, you return all of the way up the chain to
> aio_rw_vect_retry, which checks the return value (ret).  If it was 0,
> then it goes ahead and retries the complete I/O.  How does that make any
> progress?!
>
> Cheers,
> Jeff

WARNING: multiple messages have this Message-ID (diff)
From: Junxiao Bi <junxiao.bi@oracle.com>
To: Jeff Moyer <jmoyer@redhat.com>
Cc: ocfs2-devel@oss.oracle.com, linux-aio@kvack.org,
	mfasheh@suse.com, jlbec@evilplan.org, bcrl@kvack.org,
	linux-kernel@vger.kernel.org, stable@vger.kernel.org,
	joe.jin@oracle.com
Subject: Re: [PATCH 1/2] aio: make kiocb->private NUll in init_sync_kiocb()
Date: Sat, 02 Jun 2012 10:59:05 +0800	[thread overview]
Message-ID: <4FC98179.4000400@oracle.com> (raw)
In-Reply-To: <x49r4ty69bw.fsf@segfault.boston.devel.redhat.com>

On 06/02/2012 04:55 AM, Jeff Moyer wrote:
> Junxiao Bi<junxiao.bi@oracle.com>  writes:
>
>> On 05/31/2012 10:08 PM, Jeff Moyer wrote:
>>> Junxiao Bi<junxiao.bi@oracle.com>  writes:
>>>
>>>> Ocfs2 uses kiocb.*private as a flag of unsigned long size. In
>>>> commit a11f7e6 ocfs2: serialize unaligned aio, the unaligned
>>>> io flag is involved in it to serialize the unaligned aio. As
>>>> *private is not initialized in init_sync_kiocb() of do_sync_write(),
>>>> this unaligned io flag may be unexpectly set in an aligned dio.
>>>> And this will cause OCFS2_I(inode)->ip_unaligned_aio decreased
>>>> to -1 in ocfs2_dio_end_io(), thus the following unaligned dio
>>>> will hang forever at ocfs2_aiodio_wait() in ocfs2_file_write_iter().
>>>> We can't initialized this flag in ocfs2_file_write_iter() since
>>>> it may be invoked several times by do_sync_write(). So we initialize
>>>> it in init_sync_kiocb(), it's also useful for other similiar use of
>>>> it in the future.
>>> I don't see any ocfs2_file_write_iter in the upstream kernel.
>>> ocfs2_file_aio_write most certainly could set ->private to 0, it
>>> will only be called once for a given kiocb.
>>  From sys_io_submit->..->io_submit_one->aio_run_iocb->aio_rw_vect_retry,
>> it seems that aio_write could be called two times. See the following
>> scenario.
>> 1. There is a file opened with direct io flag, in aio_rw_vect_retry,
>> aio_write is called first time. If the direct io can
>> not be completed, it will fall back into buffer io, see line 2329 in
>> aio_write.
> Huh?  What's line 2329 in aio_write?
See the following code.

2312         can_do_direct = direct_io;
2313         ret = ocfs2_prepare_inode_for_write(file, ppos,
2314                                             iocb->ki_left, appending,
2315 &can_do_direct, &has_refcount);
2316         if (ret < 0) {
2317                 mlog_errno(ret);
2318                 goto out;
2319         }
2320
2321         if (direct_io && !is_sync_kiocb(iocb))
2322                 unaligned_dio = ocfs2_is_io_unaligned(inode, 
iocb->ki_left,
2323                                                       *ppos);
2324
2325         /*
2326          * We can't complete the direct I/O as requested, fall back to
2327          * buffered I/O.
2328          */
2329         if (direct_io && !can_do_direct) {
2330                 ocfs2_rw_unlock(inode, rw_level);
2331
2332                 have_alloc_sem = 0;
2333                 rw_level = -1;
2334
2335                 direct_io = 0;
2336                 goto relock;
2337         }

The above is the source code how direct io falled back to buffer io. In 
line 2313, in function ocfs2_prepare_inode_for_write(), it will judge 
whether the direct io can be executed. If not, the variable 
"can_do_direct" will be set to false, then the variable "direct_io" will 
be set to 0 in line 2335. This means that generic_file_buffered_write() 
will be called in the following code, not generic_file_direct_write(), 
see the following code. So if the generic_file_buffered_write() is a 
partial write, then its return value "written" will be made as the 
return value of the aio_write, see line 2439. Then it return back to 
aio_rw_vect_retry(), the condition (ret > 0 && iocb->ki_left > 0 && 
opcode == IOCB_CMD_PWRITEV) is true. Then aio_write will be called 
second time. As the unaligned I/O flag may be set in the kiocb at the 
first time call of aio_write, it may affect the second call of aio_write 
if its direct IO is aligned.

2372         if (direct_io) {
2373                 written = generic_file_direct_write(iocb, iov, 
&nr_segs, *ppos,
2374                                                     ppos, count, 
ocount);
2375                 if (written < 0) {
2376                         ret = written;
2377                         goto out_dio;
2378                 }
2379         } else {
2380                 current->backing_dev_info = 
file->f_mapping->backing_dev_info;
2381                 written = generic_file_buffered_write(iocb, iov, 
nr_segs, *ppos,
2382                                                       ppos, count, 0);
2383                 current->backing_dev_info = NULL;
2384         }

2438         if (written)
2439                 ret = written;
2440         return ret;
>
>> 2. If the very buffer io is a partial write, then it will return back
>> to  aio_rw_vect_retry and issue the second aio_write.
> For the generic case, the fallback to buffered I/O happens in
> __generic_file_aio_write, without bouncing all the way back up the call
> stack to aio_rw_vect_retry.  I see in ocfs2, things are a bit different:
>
> retry->aio_rw_vect_retry->ocfs2_file_aio_write->generic_file_direct_write
>    ->ocfs2_direct_IO->__blockdev_direct_IO
>
> That last function can return 0 if not all of the data was written via
> direct I/O.  At that point, you return all of the way up the chain to
> aio_rw_vect_retry, which checks the return value (ret).  If it was 0,
> then it goes ahead and retries the complete I/O.  How does that make any
> progress?!
>
> Cheers,
> Jeff


  reply	other threads:[~2012-06-02  2:59 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-05-31  4:12 [Ocfs2-devel] [PATCH 1/2] aio: make kiocb->private NUll in init_sync_kiocb() Junxiao Bi
2012-05-31  4:12 ` Junxiao Bi
2012-05-31  4:12 ` [Ocfs2-devel] [PATCH 2/2] ocfs2: clear unaligned io flag when dio fails Junxiao Bi
2012-05-31  4:12   ` Junxiao Bi
2012-05-31  4:36   ` Joe Jin
2012-05-31 14:09   ` [Ocfs2-devel] " Jeff Moyer
2012-05-31 14:09     ` Jeff Moyer
2012-06-01  1:44     ` [Ocfs2-devel] " Junxiao Bi
2012-06-01  1:44       ` Junxiao Bi
2012-05-31  4:36 ` [PATCH 1/2] aio: make kiocb->private NUll in init_sync_kiocb() Joe Jin
2012-05-31 14:08 ` [Ocfs2-devel] " Jeff Moyer
2012-05-31 14:08   ` Jeff Moyer
2012-06-01  1:41   ` [Ocfs2-devel] " Junxiao Bi
2012-06-01  1:41     ` Junxiao Bi
2012-06-01 20:55     ` [Ocfs2-devel] " Jeff Moyer
2012-06-01 20:55       ` Jeff Moyer
2012-06-02  2:59       ` Junxiao Bi [this message]
2012-06-02  2:59         ` Junxiao Bi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4FC98179.4000400@oracle.com \
    --to=junxiao.bi@oracle.com \
    --cc=bcrl@kvack.org \
    --cc=jlbec@evilplan.org \
    --cc=jmoyer@redhat.com \
    --cc=joe.jin@oracle.com \
    --cc=linux-aio@kvack.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mfasheh@suse.com \
    --cc=ocfs2-devel@oss.oracle.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.