All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michael Tokarev <mjt@tls.msk.ru>
To: stable@kernel.org
Cc: linux-ext4@vger.kernel.org
Subject: what happened with dccaf33fa37 "ext4: flush any pending end_io requests before DIO" for 3.0?
Date: Thu, 01 Dec 2011 00:38:07 +0400	[thread overview]
Message-ID: <4ED6942F.7070006@msgid.tls.msk.ru> (raw)


[-- Attachment #1.1: Type: text/plain, Size: 1722 bytes --]

Hello.

Back in August 2011, a commit has been tagged to be included
into stable, this one:

commit dccaf33fa37a1bc5d651baeb3bfeb6becb86597b
Author: Jiaying Zhang <jiayingz@google.com>
Date:   Fri Aug 19 19:13:32 2011 -0400

    ext4: flush any pending end_io requests before DIO reads w/dioread_nolock

    There is a race between ext4 buffer write and direct_IO read with
    dioread_nolock mount option enabled. The problem is that we clear
    PageWriteback flag during end_io time but will do
    uninitialized-to-initialized extent conversion later with dioread_nolock.
    If an O_direct read request comes in during this period, ext4 will return
    zero instead of the recently written data.

    This patch checks whether there are any pending uninitialized-to-initialized
    extent conversion requests before doing O_direct read to close the race.
    Note that this is just a bandaid fix. The fundamental issue is that we
    clear PageWriteback flag before we really complete an IO, which is
    problem-prone. To fix the fundamental issue, we may need to implement an
    extent tree cache that we can use to look up pending to-be-converted extents.

    Signed-off-by: Jiaying Zhang <jiayingz@google.com>
    Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
    Cc: stable@kernel.org


There was one more ext4 commit at that time, which made its way into
stable but this one did not.

I wonder if the reason for that was the fact that it needed a small
"backport" for 3.0, since in 3.1+ the code has been moved into another
file, and the context is slightly different.  In that case, attached
is the "backport" which we use with 3.0.x since that time.

Thanks!

/mjt

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1.2: ext4-flush-any-pending-end_io-requests-before-DIO-reads-w-dioread_nolock-dccaf33fa3.diff --]
[-- Type: text/x-diff; name="ext4-flush-any-pending-end_io-requests-before-DIO-reads-w-dioread_nolock-dccaf33fa3.diff", Size: 2110 bytes --]

(backported to 3.0 by mjt)

commit dccaf33fa37a1bc5d651baeb3bfeb6becb86597b
Author: Jiaying Zhang <jiayingz@google.com>
Date:   Fri Aug 19 19:13:32 2011 -0400

    ext4: flush any pending end_io requests before DIO reads w/dioread_nolock
    
    There is a race between ext4 buffer write and direct_IO read with
    dioread_nolock mount option enabled. The problem is that we clear
    PageWriteback flag during end_io time but will do
    uninitialized-to-initialized extent conversion later with dioread_nolock.
    If an O_direct read request comes in during this period, ext4 will return
    zero instead of the recently written data.
    
    This patch checks whether there are any pending uninitialized-to-initialized
    extent conversion requests before doing O_direct read to close the race.
    Note that this is just a bandaid fix. The fundamental issue is that we
    clear PageWriteback flag before we really complete an IO, which is
    problem-prone. To fix the fundamental issue, we may need to implement an
    extent tree cache that we can use to look up pending to-be-converted extents.
    
    Signed-off-by: Jiaying Zhang <jiayingz@google.com>
    Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
    Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
    Cc: stable@kernel.org

diff --git a/fs/ext4/indirect.c b/fs/ext4/indirect.c
index b8602cd..0962642 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3507,12 +3507,17 @@ ssize_t ext4_ind_direct_IO(int rw, struct kiocb *iocb,
 	}
 
 retry:
-	if (rw == READ && ext4_should_dioread_nolock(inode))
+	if (rw == READ && ext4_should_dioread_nolock(inode)) {
+		if (unlikely(!list_empty(&ei->i_completed_io_list))) {
+			mutex_lock(&inode->i_mutex);
+			ext4_flush_completed_IO(inode);
+			mutex_unlock(&inode->i_mutex);
+		}
 		ret = __blockdev_direct_IO(rw, iocb, inode,
 				 inode->i_sb->s_bdev, iov,
 				 offset, nr_segs,
 				 ext4_get_block, NULL, NULL, 0);
-	else {
+	} else {
 		ret = blockdev_direct_IO(rw, iocb, inode,
				 inode->i_sb->s_bdev, iov,
 				 offset, nr_segs,
 

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 316 bytes --]

             reply	other threads:[~2011-11-30 20:38 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-11-30 20:38 Michael Tokarev [this message]
2012-02-28 11:42 ` what happened with dccaf33fa37 "ext4: flush any pending end_io requests before DIO" for 3.0? Michael Tokarev
2012-03-17  9:31   ` Michael Tokarev
2012-03-19 16:42     ` Jan Kara
2012-03-19 17:10       ` Jiaying Zhang
2012-03-19 17:21         ` Michael Tokarev
2012-03-28 22:22           ` Greg KH
2012-03-19 17:21       ` Greg KH
2012-03-19 17:30         ` Jiaying Zhang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4ED6942F.7070006@msgid.tls.msk.ru \
    --to=mjt@tls.msk.ru \
    --cc=linux-ext4@vger.kernel.org \
    --cc=stable@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.