linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <djwong@us.ibm.com>
To: Andrew Morton <akpm@linux-foundation.org>,
	Theodore Tso <tytso@mit.edu>, Jan Kara <jack@suse.cz>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Jens Axboe <axboe@kernel.dk>,
	"Martin K. Petersen" <martin.petersen@oracle.com>,
	Jeff Layton <jlayton@redhat.com>,
	Dave Chinner <david@fromorbit.com>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	Dave Hansen <dave@linux.vnet.ibm.com>,
	Christoph Hellwig <hch@infradead.org>,
	linux-mm@kvack.org, Chris Mason <chris.mason@oracle.com>,
	Joel Becker <jlbec@evilplan.org>,
	linux-scsi <linux-scsi@vger.kernel.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	linux-ext4@vger.kernel.org, Mingming Cao <mcao@us.ibm.com>
Subject: Re: [PATCHSET v3.1 0/7] data integrity: Stabilize pages during writeback for various fses
Date: Mon, 16 May 2011 12:04:27 -0700	[thread overview]
Message-ID: <20110516190427.GN20579@tux1.beaverton.ibm.com> (raw)
In-Reply-To: <20110509230318.19566.66202.stgit@elm3c44.beaverton.ibm.com>

On Mon, May 09, 2011 at 04:03:18PM -0700, Darrick J. Wong wrote:
> Hi all,
> 
> This is v3.1 of the stable-page-writes patchset for ext4/3/2, xfs, and fat.
> The purpose of this patchset is to prohibit processes from writing on memory
> pages that are currently being written to disk because certain storage setups
> (e.g. SCSI disks with DIF integrity checksums) will fail a write if the page
> contents don't match the checksum.  btrfs already guarantees page stability, so
> it does not use these changes.
> 
> The technique used is fairly simple -- whenever a page is about to become
> writable (either because of a write fault to a mapped page, or a buffered write
> is in progress), wait for the page writeback flag to be clear, indicating that
> the page is not being written to disk.  This means that it is necessary (1) to
> add wait for writeback code to grab_cache_page_write_begin to take care of
> buffered writes, and (2) all filesystems must have a page_mkwrite that locks a
> page, waits for writeback, and returns the locked page.  For filesystems that
> piggyback on the generic block_page_mkwrite, the patchset adds the writeback
> wait to that function; for filesystems that do not use the page_mkwrite hook at
> all, the patchset provides a stub page_mkwrite.
> 
> I ran my write-after-checksum ("wac") reproducer program to try to create the
> DIF checksum errors by madly rewriting the same memory pages.  In fact, I tried
> the following combinations against ext2/3/4, xfs, btrfs, and vfat:
> 
> a. 64 write() threads + sync_file_range
> b. 64 mmap write threads + msync
> c. 32 write() threads + sync_file_range + 32 mmap write threads + msync
> d. Same as C, but with all threads in directio mode
> e. Same as A, but with all threads in directio mode
> f. Same as B, but with all threads in directio mode
> 
> After running profiles A-F for 30 minutes each on 6 different machines, ext2,
> ext4, xfs, and vfat reported no errors.  ext3 still has a lingering failure
> case (which I will touch on briefly later) and btrfs eventually reports -ENOSPC
> and fails the test, though it does that even without any of the patches applied.
> 
> To assess the performance impact of stable page writes, I moved to a disk that
> doesn't have DIF support so that I could measure just the impact of waiting for
> writeback.  I first ran wac with 64 threads madly scribbling on a 64k file and
> saw about a 12 percent performance decrease.  I then reran the wac program with
> 64 threads and a 64MB file and saw about the same performance numbers.  As I
> suspected, the patchset only seems to impact workloads that rewrite the same
> memory page frequently.
> 
> I am still chasing down what exactly is broken in ext3.  data=writeback mode
> passes with no failures.  data=ordered, however, does not pass; my current
> suspicion is that jbd is calling submit_bh on data buffers but doesn't call
> page_mkclean to kick the userspace programs off the page before writing it.
> 
> Per various comments regarding v3 of this patchset, I've integrated his
> suggestions, reworked the patch descriptions to make it clearer which ones
> touch all the filesystems and which ones are to fix remaining holes in specific
> filesystems, and expanded the scope of filesystems that got fixed.
> 
> As always, questions and comments are welcome; and thank you to all the
> previous reviewers of this patchset.  I am also soliciting people's opinions on
> whether or not these patches could go upstream for .40.

[adding Andrew Morton to cc]

Ted, Mingming, and I were discussing how we might get this patchset pushed
upstream on today's ext4 community call.  The ext2 patch can be dropped since
it really only was there as a proof that the generic mm/fs fixes actually
worked.  I'm unsure of the vfat maintainer's feelings on the patchset, though
he seems concerned about the performance of apps that rewrite pages frequently.
Ted seemed agreeable with the ext4 changes, though I don't know if he's
reviewed them thoroughly yet.

Ted asked for clarification as to which patches are needed to fix ext4.
Patches 1 and 5 are the two that are required for ext4, and patch 4 cleans up
some ext4 code.

Patches 1 and 2 are needed to fix xfs and nilfs.

Patches 1 and 3 (and fs-specific patches such as 6 & 7) are needed to fix the
rest.

Unfortunately, Ted and I weren't sure who (if anyone) is in charge of pushing
mm and generic fs patches upstream.  Ted suggested Andrew Morton for the mm
patches (1 & 3) and we weren't sure if Al Viro or Christoph (or someone else)
is in charge of generic vfs patches (patch #2).

So, who do I actually ask to take the mm and vfs patches?  Andrew, can I send
you patches?

--D

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2011-05-16 19:05 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-05-09 23:03 [PATCHSET v3.1 0/7] data integrity: Stabilize pages during writeback for various fses Darrick J. Wong
2011-05-09 23:03 ` [PATCH 1/7] mm: Wait for writeback when grabbing pages to begin a write Darrick J. Wong
2011-05-09 23:03 ` [PATCH 2/7] fs: block_page_mkwrite should wait for writeback to finish Darrick J. Wong
2011-05-10 12:41   ` Jan Kara
2011-05-10 17:12     ` Darrick J. Wong
2011-05-09 23:03 ` [PATCH 3/7] mm: Provide stub page_mkwrite functionality to stabilize pages during writes Darrick J. Wong
2011-05-09 23:03 ` [PATCH 4/7] ext4: Clean up some wait_on_page_writeback calls Darrick J. Wong
2011-05-18 18:16   ` [4/7] " Ted Ts'o
2011-05-09 23:03 ` [PATCH 5/7] ext4: Wait for writeback to complete while making pages writable Darrick J. Wong
2011-05-18 18:17   ` [5/7] " Ted Ts'o
2011-05-09 23:04 ` [PATCH 6/7] ext2: Lock buffer_head during metadata update Darrick J. Wong
2011-05-09 23:04 ` [PATCH 7/7] fat: Lock buffer_head during metadata updates Darrick J. Wong
2011-05-10  0:06 ` [PATCHSET v3.1 0/7] data integrity: Stabilize pages during writeback for various fses Dave Chinner
2011-05-10  1:59 ` OGAWA Hirofumi
2011-05-10 12:38   ` Jan Kara
2011-05-10 13:12     ` OGAWA Hirofumi
2011-05-10 13:29       ` Jan Kara
2011-05-10 13:46         ` OGAWA Hirofumi
2011-05-10 14:05           ` OGAWA Hirofumi
2011-05-10 14:54             ` Jan Kara
2011-05-10 16:12               ` OGAWA Hirofumi
2011-05-10 16:22                 ` Jan Kara
2011-05-10 16:28                   ` OGAWA Hirofumi
2011-05-16 18:47                     ` Darrick J. Wong
2011-05-16 19:31                       ` OGAWA Hirofumi
2011-05-17  1:23                         ` Darrick J. Wong
2011-05-17  3:30                           ` OGAWA Hirofumi
2011-10-23 16:38                   ` Andy Lutomirski
2011-05-10 13:36   ` Christoph Hellwig
2011-05-10 13:52     ` OGAWA Hirofumi
2011-05-10 14:49       ` Jan Kara
2011-05-10 15:24         ` OGAWA Hirofumi
2011-05-10 16:18           ` Jan Kara
2011-05-10 16:29             ` OGAWA Hirofumi
2011-05-10 17:03               ` Jan Kara
2011-05-10 17:03           ` Christoph Hellwig
2011-05-10 20:50             ` OGAWA Hirofumi
2011-05-11  5:55               ` Christoph Hellwig
2011-05-11  9:36                 ` OGAWA Hirofumi
2011-05-10 12:51 ` Jan Kara
2011-05-10 16:24   ` Chris Mason
2011-05-11 18:19   ` Darrick J. Wong
2011-05-12  9:42     ` Jan Kara
2011-05-16 18:49       ` Darrick J. Wong
2011-05-16 18:59         ` Jan Kara
2011-05-16 19:09           ` Darrick J. Wong
2011-05-16 19:04 ` Darrick J. Wong [this message]
2011-05-16 20:27   ` Christoph Hellwig
2011-05-16 20:55     ` Darrick J. Wong
2011-05-17 14:01       ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110516190427.GN20579@tux1.beaverton.ibm.com \
    --to=djwong@us.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=axboe@kernel.dk \
    --cc=chris.mason@oracle.com \
    --cc=dave@linux.vnet.ibm.com \
    --cc=david@fromorbit.com \
    --cc=hch@infradead.org \
    --cc=hirofumi@mail.parknet.co.jp \
    --cc=jack@suse.cz \
    --cc=jlayton@redhat.com \
    --cc=jlbec@evilplan.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=mcao@us.ibm.com \
    --cc=tytso@mit.edu \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).