All of lore.kernel.org
 help / color / mirror / Atom feed
From: Chris Mason <chris.mason@oracle.com>
To: Jan Kara <jack@suse.cz>
Cc: "Darrick J. Wong" <djwong@us.ibm.com>,
	Dave Chinner <david@fromorbit.com>,
	Joel Becker <jlbec@evilplan.org>,
	"Martin K. Petersen" <martin.petersen@oracle.com>,
	Jens Axboe <axboe@kernel.dk>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Mingming Cao <mcao@us.ibm.com>,
	linux-scsi <linux-scsi@vger.kernel.org>
Subject: Re: [RFC] block integrity: Fix write after checksum calculation problem
Date: Mon, 21 Mar 2011 10:24:41 -0400	[thread overview]
Message-ID: <1300716666-sup-2087@think> (raw)
In-Reply-To: <20110321140451.GA7153@quack.suse.cz>

Excerpts from Jan Kara's message of 2011-03-21 10:04:51 -0400:
> On Fri 18-03-11 17:07:55, Darrick J. Wong wrote:
> > > > Ok, here's what I have so far.  I took everyone's suggestions of where to add
> > > > calls to wait_on_page_writeback, which seems to handle the multiple-write case
> > > > adequately.  Unfortunately, it is still possible to generate checksum errors by
> > > > scribbling furiously on a mmap'd region, even after adding the writeback wait
> > > > in the ext4 writepage function.  Oddly, I couldn't break btrfs with mmap by
> > > > removing its wait_for_page_writeback call, so I suspect there's a bit more
> > > > going on in btrfs than I've been able to figure out.
> > 
> > I wonder, is it possible for this to happen:
> > 
> > 1. Thread A mmaps a page and tries to write to it.  ext4_page_mkwrite executes,
> >    but there's no ongoing writeback, so it returns without delay.
> > 2. Thread A starts writing furiously to the page.
> > 3. Thread B runs fsync() or something that results in the page being
> >    checksummed and scheduled for writeout.
> > 4. Thread A continues to write furiously(!) on that same page before the
> >    controller finishes the DMA transfer.
> > 5. Disk gets the page, which now doesn't match its checksum, and *boom*
>   What happens on writepage (see mm/page-writeback.c:write_cache_pages())
> is:
>   lock_page(page)
>   ...
>   clear_page_dirty_for_io() - removes PageDirty, marks page as read-only in
>     PTE
>   ...
>   set_page_writeback() (happens e.g. in __block_write_full_page() called
> from filesystem's writepage implementation).
>   unlock_page(page)
> 
>   So if you compute the checksum after set_page_writeback() is done in the
> writepage() implementation (you cannot use __block_write_full_page() in
> that case) and you call wait_on_page_writeback() in ext4_page_mkwrite()
> under page lock, you should be safe. If you do all this and still see
> errors, something is broken I'd say...

Looking at the ext4_page_mkwrite, it does this:

lock the page
check for holes
unlock the page
if (no_holes)
	return;

write_begin/write_end
return

So, to have page_mkwrite work, you need to wait for writeback with the
page locked in both the no holes case and after the
write_begin/write_end.  write_begin will dirty the page, so someone can
wander in and start the IO while we are still in page_mkwrite.

This is untested and uncompiled, but it should
do the trick.

Jan, did you get rid of all the buffer head based writeback for
data=ordered in ext4?  That's my only other idea, that someone is doing
writeback directly without taking the page lock.

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 9f7f9e4..8a75e12 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -5880,6 +5880,7 @@ int ext4_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
 	if (page_has_buffers(page)) {
 		if (!walk_page_buffers(NULL, page_buffers(page), 0, len, NULL,
 					ext4_bh_unmapped)) {
+			wait_on_page_writeback(page);
 			unlock_page(page);
 			goto out_unlock;
 		}
@@ -5901,6 +5902,16 @@ int ext4_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
 	if (ret < 0)
 		goto out_unlock;
 	ret = 0;
+
+	/*
+	 * write_begin/end might have created a dirty page and someone
+	 * could wander in and start the IO.  Make sure that hasn't
+	 * happened
+	 */
+	lock_page(page);
+	wait_on_page_writeback(page);
+	unlock_page(page);
+
 out_unlock:
 	if (ret)
 		ret = VM_FAULT_SIGBUS;

  reply	other threads:[~2011-03-21 14:24 UTC|newest]

Thread overview: 79+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-02-22  2:00 [RFC] block integrity: Fix write after checksum calculation problem Darrick J. Wong
2011-02-22  5:45 ` Boaz Harrosh
2011-02-22 11:42   ` Jan Kara
2011-02-22 13:02     ` Chris Mason
2011-02-22 19:13       ` Boaz Harrosh
2011-03-04 20:51     ` Darrick J. Wong
2011-03-04 20:53       ` Christoph Hellwig
2011-02-22 16:13 ` Andreas Dilger
2011-02-22 16:40   ` Martin K. Petersen
2011-02-22 16:40     ` Martin K. Petersen
2011-02-22 19:45   ` Darrick J. Wong
2011-02-22 22:53     ` Dave Chinner
2011-02-23 16:24       ` Martin K. Petersen
2011-02-23 16:24         ` Martin K. Petersen
2011-02-23 23:47         ` Dave Chinner
2011-02-24 16:43         ` Jan Kara
2011-02-28  8:49   ` Christoph Hellwig
2011-02-22 16:45 ` Martin K. Petersen
2011-02-23 20:24   ` Joel Becker
2011-02-23 20:35     ` Chris Mason
2011-02-23 21:42       ` Joel Becker
2011-02-24 16:47       ` Jan Kara
2011-02-24 17:37         ` Chris Mason
2011-02-24 18:27           ` Darrick J. Wong
2011-02-28 12:54             ` Chris Mason
2011-03-04 21:07               ` Darrick J. Wong
2011-03-04 22:22                 ` Andreas Dilger
2011-03-07 19:11                   ` Darrick J. Wong
2011-03-07 21:12                 ` Chris Mason
2011-03-08  4:56                 ` Dave Chinner
2011-03-10 23:57                   ` Darrick J. Wong
2011-03-11 16:34                     ` Chris Mason
2011-03-11 18:51                       ` Darrick J. Wong
2011-03-19  0:07                   ` Darrick J. Wong
2011-03-19  2:28                     ` Andreas Dilger
2011-03-22 19:23                       ` Darrick J. Wong
2011-03-22 21:54                         ` Jan Kara
2011-03-21 14:04                     ` Jan Kara
2011-03-21 14:24                       ` Chris Mason [this message]
2011-03-21 16:43                         ` Jan Kara
2011-04-06 23:29                           ` Darrick J. Wong
2011-04-07 16:44                             ` Darrick J. Wong
2011-04-07 16:57                             ` Jan Kara
2011-04-08 20:31                               ` Darrick J. Wong
2011-04-11 16:42                                 ` Jeff Layton
2011-04-11 17:41                                   ` Chris Mason
2011-04-11 18:25                                     ` Christoph Hellwig
2011-04-11 18:38                                       ` Chris Mason
2011-04-12  0:46                                     ` Mingming Cao
2011-04-12  0:57                                       ` Christoph Hellwig
2011-04-14  0:48                                         ` Mingming Cao
2011-04-22  0:02                                           ` [RFC v2] block integrity: Stabilize(?) pages during writeback Darrick J. Wong
2011-04-22 12:50                                             ` Chris Mason
2011-04-22 20:34                                               ` Jan Kara
2011-04-26  0:37                                                 ` Darrick J. Wong
2011-04-26 11:33                                                   ` Chris Mason
2011-05-03  1:59                                                     ` Darrick J. Wong
2011-05-04  1:26                                                       ` Darrick J. Wong
2011-04-26 11:37                                                   ` Jan Kara
2011-05-04 17:37                                             ` [PATCH v3 0/3] data integrity: Stabilize pages during writeback for ext4 Darrick J. Wong
2011-05-04 17:37                                               ` Darrick J. Wong
2011-05-04 18:46                                               ` Christoph Hellwig
2011-05-04 18:46                                                 ` Christoph Hellwig
2011-05-04 19:21                                                 ` Chris Mason
2011-05-04 19:21                                                   ` Chris Mason
2011-05-04 20:00                                                   ` Darrick J. Wong
2011-05-04 20:00                                                     ` Darrick J. Wong
2011-05-04 23:57                                                   ` Darrick J. Wong
2011-05-04 23:57                                                     ` Darrick J. Wong
2011-05-05 15:26                                                     ` Jan Kara
2011-05-05 15:26                                                       ` Jan Kara
2011-05-04 17:39                                             ` [PATCH v3 1/3] ext4: Clean up some wait_on_page_writeback calls Darrick J. Wong
2011-05-04 17:39                                               ` Darrick J. Wong
2011-05-04 17:41                                             ` [PATCH v3 2/3] ext4: Wait for writeback to complete while making pages writable Darrick J. Wong
2011-05-04 17:41                                               ` Darrick J. Wong
2011-05-04 17:42                                             ` [PATCH v3 3/3] mm: Wait for writeback when grabbing pages to begin a write Darrick J. Wong
2011-05-04 17:42                                               ` Darrick J. Wong
2011-05-04 18:48                                               ` Christoph Hellwig
2011-05-04 18:48                                                 ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1300716666-sup-2087@think \
    --to=chris.mason@oracle.com \
    --cc=axboe@kernel.dk \
    --cc=david@fromorbit.com \
    --cc=djwong@us.ibm.com \
    --cc=jack@suse.cz \
    --cc=jlbec@evilplan.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=mcao@us.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.