From: "bmoon" <bo@anthologysolutions.com>
To: neilb@cse.unsw.edu.au
Cc: Philip Cameron <pecameron@attbi.com>, linux-raid@vger.kernel.org
Subject: Re: Fast RAID 1 Resync
Date: Thu, 16 Jan 2003 15:22:39 -0800 [thread overview]
Message-ID: <029901c2bdb6$299abee0$6a01a8c0@bmoon> (raw)
In-Reply-To: 3E14E5D8.70409@attbi.com
Neil,
It is very interesting but I have a couple of questions on your suggestion
as fellow;
1) Is a bit on the bitmap match with each chunk?
2) it has a counter( or timestamp) for each chunk on each mirror, right?
then how do we find the timestamp matched or not? what do you compare
it with?
3) I think "a bit on the bitmap has the same meaning as the MSB(high bit) on
the counter".
That means you need only a counter for each chunk? Am I wrong?
Bo
----- Original Message -----
From: "Philip Cameron" <pecameron@attbi.com>
To: <linux-raid@vger.kernel.org>; <neilb@cse.unsw.edu.au>
Sent: Thursday, January 02, 2003 5:22 PM
Subject: Re: Fast RAID 1 Resync
> Hi Neil,
>
> (Sorry if this is a repost. I had an error with returned mail)
>
> Thanks for your comments.
>
> I also don't see a need to synchronize disks at mkraid time. Its nice to
have
> identical disks but not necessary as long as the result of reading a
sector that
> has never been written is undefined. An option to do either approach can
be done
> as long as there is a real need. Adding an option increases complexity
> especially during test.
>
> I have been thinking of tracking writes to each chunk with a counter. The
> counters would be organized into a vector indexed by chunk number. The
counter
> is incremented by the number of mirrors including any currently
unavailable
> mirrors before the write starts. It is decremeted as each write completes.
So
> when all writes in a chunk are complete the counter returns to zero. If
there is
> a missing mirror, the counter will not return to zero (since one of the
needed
> writes was not done).
>
> When a disk is pulled, the counters increment but don't return to zero.
When the
> disk is reinserted, the resync needs to copy chunks where the counter
doesn't go
> to zero. When the resync of the chunk is complete, set the counter to
zero.
>
> To deal with a recovery after crash, I am thinking about using your
approach.
> Use a bit per counter and set the bit when the counter is non-zero. When a
bit
> goes from 0 to 1, the updated bit vector is written before starting the
write to
> the chunk. On reboot after a crash, the bit vector from the selected
mirror is
> used (the current mechanism is used to select the base disk). The counter
is
> incremented for each chunk that has a bit that is set. After this, the
resync in
> the above case can be performed. I don't see a need for timestamps beyond
what
> is currently being done.
>
> The transitions from 1 to 0 are not all that important since the worst
case is
> syncing a chunk that is already mirrored. Overall performance can be
improved by
> delaying the 1 to 0 transition for a few seconds. A lazy write of the bits
can
> be done every 10 seconds or so if there are no 0 to 1 changes during that
interval.
>
> I have a third goal: minimize the resync time for a new (replacement)
disk. In
> this case all of the chunks that have ever been written need to be copied.
I
> have been thinking about controlling this through a second bit vector.
When a
> chunk is written for the first time the bit is set and it is never reset.
When
> the new disk is added, the counters corresponding to all of the bits that
have
> been set in the vector are incremented and a resync (as above) is
performed. As
> above the vector update needs to be done before the write to the chunk.
The
> length of resync is proportional to how much of the disk has been used.
>
> A hardware note: the system has two IO assemblies each of which contains a
PCI
> bus, SCSI HBA and 3 hot plugable SCSI disk slots. We are using 72GB disks.
Sets
> of 2 mirror RAID1 raid sets is the most practical configuration.
>
> Phil Cameron
>
> >>
> >> Hi,
> >
> >
> >> You have two quite different, though admittedly similar, goal here.
> >> 1/ quick resync when a recently removed drive is re-added.
> >> 2/ quick resync after an unclean shutdown.
> >>
> >> I would treat these quite separately.
> >>
> >> For the latter I would have a bitmap which was written to disk
> >> whenever a bit was set and eventually after a bit was cleared.
> >> I would use a 16 bit counter for each 'chunk'.
> >> If the high bit is clear, it stores the number of outstanding writes
> >> on that chunk.
> >> If the high bit is set, it stores some sort of time stamp of when the
> >> number of outstanding writes hit zero.
> >> Every time you write the bitmap, you increment this timestamp.
> >> So when you schedule a write, you only need to write out the bitmap
> >> first if the 16bit number of this chunk has the highbit set and has a
> >> timestamp which is different to the current one - which means that
> >> the bitmap has been written out with a zero in this slot.
> >> So:
> >> On write, if highbit clear, increment counter
> >> if highbit set and timestamp matches, set counter to 1
> >> and set bit in bitmap
> >> if highbit set and timestamp doesn't match, set
> >> bit in bitmap, schedule write, set counter to 1
> >> On write complete,
> >> decrement counter. If it hits zero, set to timestamp with
> >> high bit set, clear the bitmap bit, and schudle a bitmap
> >> writout a few seconds hence.
> >>
> >> For the former I would just hold a separate bitmap, one bit per
> >> chunk.
> >> While all drives are working, this bitmap would be all zeros.
> >> Whenever a write fails to write to all drives, the relevant bit gets
> >> set.
> >> When a recently failed drive comes back online, we resync all chunks
> >> that have that bit set.
> >>
> >> I don't see a particular need to sync the drives are device creation
> >> time, but I would like to keep the option of doing so. I don't
> >> really care which behaviour is the default.
> >>
> >> NeilBrown
> >
> >
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
prev parent reply other threads:[~2003-01-16 23:22 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2003-01-03 1:22 Fast RAID 1 Resync Philip Cameron
2003-01-16 23:22 ` bmoon [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='029901c2bdb6$299abee0$6a01a8c0@bmoon' \
--to=bo@anthologysolutions.com \
--cc=linux-raid@vger.kernel.org \
--cc=neilb@cse.unsw.edu.au \
--cc=pecameron@attbi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).