linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: scjody@sun.com
To: linux-ext4@vger.kernel.org, linux-raid@vger.kernel.org
Cc: linux-kernel@vger.kernel.org, Andreas Dilger <adilger@sun.com>
Subject: [patch 0/5] Journal guided resync and support
Date: Thu, 19 Nov 2009 16:22:41 -0500	[thread overview]
Message-ID: <20091119212241.283629302@sun.com> (raw)

This is an updated implementation of journal guided resync, intended to be
suitable for production systems.  This feature addresses the problem with RAID
arrays that take too long to resync - similar to the existing MD write-intent
bitmap feature, we resync only the stripes that were undergoing writes at the
time of the crash.  Unlike write-intent bitmaps, our testing shows very little
performance degredation as a result of the feature - around 3-5% vs around 30%
for bitmaps.

This feature is based on work described in this paper:
http://www.usenix.org/events/fast05/tech/denehy.html

As a summary, we introduce a new data write mode known as declared mode.  This
is based on ordered mode except that a list of blocks to be written during the
current transaction is added to the journal before the blocks themselves are
written to the disk.  Then, if the system crashes, we can resync only those
blocks during journal replay and skip the rest of the resync of the RAID array.

The changes consist of patches to ext3, jbd, MD, and the raid456 personality.
These patches are currently against the RHEL 5 kernel 2.6.18-128.7.1.  Porting
to ext4/jbd2 and a more modern kernel is a TODO item.

Changes since the previous set of patches: I have addressed all review comments
received.  Noteable is a design change based on Neil Brown's suggestions: the
filesystem now sets a buffer flag (fs_raidsync) to inform MD that the
filesystem is taking responsibility for resyncing parity on this stripe in
the event of a system crash.  For RAID 4/5/6, setting this flag causes the
write intent bitmap NOT to be updated for the write in question.  There is
also a buffer flag (syncraid) used by jbd to resync parity.  Together these
eliminate most of the need for ioctls, though one is still needed for e2fsck.

Unfortunately, we have determined that these patches are NOT useful to Lustre.
Therefore I will not be doing any more work on them.  I am sending them now in
case they are useful as a starting point for someone else's work.

Cheers,
Jody

             reply	other threads:[~2009-11-19 21:22 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-11-19 21:22 scjody [this message]
2009-11-19 21:22 ` [patch 1/5] [md] Add fs_raidsync buffer & bio flags scjody
2009-11-19 21:22 ` [patch 2/5] [md] Add syncraid " scjody
2009-11-19 21:22 ` [patch 3/5] [jbd] Add support for journal guided resync scjody
2009-11-19 21:22 ` [patch 4/5] [ext3] Add journal guided resync (data=declared mode) scjody
2009-11-19 21:22 ` [patch 5/5] [md] Add SET_RESYNC_ALL and CLEAR_RESYNC_ALL ioctls scjody
2009-11-24 11:43 ` [patch 0/5] Journal guided resync and support Pavel Machek
2009-11-24 18:51   ` Andreas Dilger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20091119212241.283629302@sun.com \
    --to=scjody@sun.com \
    --cc=adilger@sun.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).