From: Bill Davidsen <davidsen@tmr.com>
To: Farkas Levente <lfarkas@lfarkas.org>
Cc: NeilBrown <neilb@suse.de>, linux-raid@vger.kernel.org
Subject: Re: [PATCH 00/18] Assorted md patches headed for 2.6.30
Date: Fri, 13 Feb 2009 12:02:21 -0500 [thread overview]
Message-ID: <4995A79D.9050701@tmr.com> (raw)
In-Reply-To: <49940539.3030008@lfarkas.org>
Farkas Levente wrote:
> NeilBrown wrote:
>
>> On Thu, February 12, 2009 8:42 pm, Farkas Levente wrote:
>>
>>> NeilBrown wrote:
>>>
>>>> Hi,
>>>> following is my current patch queue for 2.6.30, in case anyone would
>>>> like to review or otherwise comment.
>>>> They should show up in -next shortly.
>>>>
>>>> Probably the most interesting are the last few which provide support
>>>> for converting a raid1 into a raid5, and a raid5 into a raid6.
>>>> I plan to do some more work here so the code might change a bit before
>>>> final submission, as I work out how best ot factor the code.
>>>>
>>>> mdadm doesn't current support these conversions, but you can
>>>> simply
>>>> echo raid5 > /sys/block/md0/md/level
>>>> to change a 2-drive raid1 into a raid5. Similarly for 5->6
>>>>
>>> any plan for non-raid to raid1 or anything else like in windows on can
>>> convert a normal partition into a mirrored one online.
>>>
>> No plan exactly, but I do think about it from time to time.
>>
>> There are two problems with this, and solving just one of them
>> doesn't help you much. So you really have to solve both at once,
>> which reduces the motivation towards either ....
>>
>> One problem is the task of changing the implementation of the device
>> underneath the filesystem without the filesystem needing to care.
>>
>> i.e. the filesystem opens block device 8,1 (/dev/sda1) and starts do
>> IO, then mdadm steps in and magically changes things so that /dev/sda1
>> is now on a raid1 array which happens to access the same data, but
>> through a different code path.
>> Figuring out exactly which data structure to install the redirection
>> and how to doing in a way that is guaranteed to be safe is non-trivial.
>>
>> dm has a mechanism to change the implementation under a given dm
>> device, and md now has an mechanism to change the implementation
>> under a given md device. But generalising that to 'any device' is
>> not entirely trivial. Now that I have done it for md I'm in a better
>> position to understand how it might be done.
>>
>> The other problem is where to store the metadata. You need at least a
>> few bytes and realistically 1K of space on the devices that is free to
>> be used by md to record information about device state to allow arrays to
>> be assembled correctly.
>>
>> One idea I had was to get the filesystem to allocate a block and make that
>> available to md, then md would copy the data from the last block of the
>> device into that block and redirect all IO request aim at the
>> last block so that really access the relocated block. Then md puts
>> it's metadata in that last block.
>>
>> This could work but is a little to error prone for my liking. e.g.
>> if you fsck the device, you suddenly loose your guarantee that
>> the filesystem isn't going to write to that relocation block.
>>
>> I think it could only work if mdadm can inspect the device and ensure
>> that the last block isn't part of any partition, or any active filesystem.
>> This is possible, but messy.
>>
>> e.g. on my notebook which has a 250Gig drive whatever I used to partition
>> it (cfdisk?) insisted on using multiples of cylinders for partitions
>> (what an out-of-date concept!) and as the reported geometry is
>>
>> Disk /dev/sda: 250.0 GB, 250059350016 bytes
>> 255 heads, 63 sectors/track, 30401 cylinders
>>
>> There are 5013 unused sectors at the end - plenty of room for
>> md to put some metadata. But if someone else had used sfdisk,
>> I think they would find no spare space and be unhappy.
>>
>> Maybe it is sufficient to support just those people who are
>> lucky enough to not be using the whole device...
>>
>>
>> So it might happen, but it is just a little to easy to stick this
>> one in the too-hard basket.
>>
>
> the main reason here is our life. i saw many cases where there was a
> system installed to one system and later it'd be nice to make it
> redundant (a most sysadm said: it's not working on linux it's even
> working on windows, just put into a new disk and make it mirror).
> so i don't know the technical detail, but would be a very useful feature.
>
>
I think you can get there for normal file systems data by creating a
raid1 on a new drive using a failed drive. Then copy the data from the
unmirrored drive to the mirrored f/s, unmount the original drive and
mount the array, and add the original drive to the new array. This is
ugly, and a verified backup and restore is better, but it can be done.
--
Bill Davidsen <davidsen@tmr.com>
"Woe unto the statesman who makes war without a reason that will still
be valid when the war is over..." Otto von Bismark
next prev parent reply other threads:[~2009-02-13 17:02 UTC|newest]
Thread overview: 50+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-02-12 3:10 [PATCH 00/18] Assorted md patches headed for 2.6.30 NeilBrown
2009-02-12 3:10 ` [PATCH 01/18] md: never clear bit from the write-intent bitmap when the array is degraded NeilBrown
2009-02-12 3:10 ` [PATCH 04/18] md: be more consistent about setting WriteMostly flag when adding a drive to an array NeilBrown
2009-02-12 3:10 ` [PATCH 05/18] md: Make mddev->size sector-based NeilBrown
2009-02-12 3:10 ` [PATCH 06/18] md: Represent raid device size in sectors NeilBrown
2009-02-12 3:10 ` [PATCH 02/18] md: write bitmap information to devices that are undergoing recovery NeilBrown
2009-02-12 3:10 ` [PATCH 08/18] md/raid5: change raid5_compute_sector and stripe_to_pdidx to take a 'previous' argument NeilBrown
2009-02-12 3:10 ` [PATCH 07/18] md/raid5: simplify interface for init_stripe and get_active_stripe NeilBrown
2009-02-12 3:10 ` [PATCH 03/18] md: occasionally checkpoint drive recovery to reduce duplicate effort after a crash NeilBrown
2009-02-12 17:26 ` John Stoffel
2009-02-13 16:20 ` Bill Davidsen
2009-02-13 16:34 ` Jon Nelson
2009-02-12 3:10 ` [PATCH 14/18] md: md_unregister_thread should cope with being passed NULL NeilBrown
2009-02-12 3:10 ` [PATCH 09/18] md/raid6: remove expectation that Q device is immediately after P device NeilBrown
2009-02-12 16:56 ` Andre Noll
2009-02-13 22:19 ` Dan Williams
2009-02-16 0:08 ` Neil Brown
2009-02-13 16:37 ` Bill Davidsen
2009-02-16 5:15 ` Neil Brown
2009-02-12 3:10 ` [PATCH 10/18] md/raid5: simplify raid5_compute_sector interface NeilBrown
2009-02-12 3:10 ` [PATCH 15/18] md: hopefully enable suspend/resume of md devices NeilBrown
2009-02-12 3:10 ` [PATCH 18/18] md/raid5: allow layout/chunksize to be changed on an active2-drive raid5 NeilBrown
2009-02-12 3:10 ` [PATCH 16/18] md: add ->takeover method to support changing the personality managing an array NeilBrown
2009-02-12 3:10 ` [PATCH 12/18] md/raid5: finish support for DDF/raid6 NeilBrown
2009-02-12 3:10 ` [PATCH 17/18] md: add ->takeover method for raid5 to be able to take over raid1 NeilBrown
2009-02-12 3:10 ` [PATCH 11/18] md/raid5: Add support for new layouts for raid5 and raid6 NeilBrown
2009-02-12 3:10 ` [PATCH 13/18] md/raid5: refactor raid5 "run" NeilBrown
2009-02-12 8:11 ` [PATCH 00/18] Assorted md patches headed for 2.6.30 Keld Jørn Simonsen
2009-02-12 9:13 ` Steve Fairbairn
2009-02-12 9:46 ` Keld Jørn Simonsen
2009-02-12 10:52 ` NeilBrown
2009-02-12 11:16 ` Keld Jørn Simonsen
2009-02-12 10:53 ` Julian Cowley
2009-02-13 16:54 ` Bill Davidsen
2009-02-16 5:35 ` Neil Brown
2009-02-16 17:31 ` Nagilum
2009-02-12 22:57 ` Dan Williams
2009-02-13 16:56 ` Bill Davidsen
2009-02-12 9:21 ` NeilBrown
2009-02-12 9:53 ` Keld Jørn Simonsen
2009-02-12 10:45 ` NeilBrown
2009-02-12 11:11 ` Keld Jørn Simonsen
2009-02-12 15:28 ` Wil Reichert
2009-02-12 17:44 ` Keld Jørn Simonsen
2009-02-12 9:42 ` Farkas Levente
2009-02-12 10:40 ` NeilBrown
2009-02-12 11:17 ` Farkas Levente
2009-02-13 17:02 ` Bill Davidsen [this message]
-- strict thread matches above, loose matches on Subject: below --
2009-03-10 8:24 jzc-sina
[not found] <7554605.886551236670855947.JavaMail.coremail@bj163app40.163.com>
2009-03-13 1:00 ` Neil Brown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4995A79D.9050701@tmr.com \
--to=davidsen@tmr.com \
--cc=lfarkas@lfarkas.org \
--cc=linux-raid@vger.kernel.org \
--cc=neilb@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).