All of lore.kernel.org
 help / color / mirror / Atom feed
From: Bill Davidsen <davidsen@tmr.com>
To: Farkas Levente <lfarkas@lfarkas.org>
Cc: NeilBrown <neilb@suse.de>, linux-raid@vger.kernel.org
Subject: Re: [PATCH 00/18] Assorted md patches headed for 2.6.30
Date: Fri, 13 Feb 2009 12:02:21 -0500	[thread overview]
Message-ID: <4995A79D.9050701@tmr.com> (raw)
In-Reply-To: <49940539.3030008@lfarkas.org>

Farkas Levente wrote:
> NeilBrown wrote:
>   
>> On Thu, February 12, 2009 8:42 pm, Farkas Levente wrote:
>>     
>>> NeilBrown wrote:
>>>       
>>>> Hi,
>>>>  following is my current patch queue for 2.6.30, in case anyone would
>>>> like to review or otherwise comment.
>>>> They should show up in -next shortly.
>>>>
>>>> Probably the most interesting are the last few which provide support
>>>> for converting a raid1 into a raid5, and a raid5 into a raid6.
>>>> I plan to do some more work here so the code might change a bit before
>>>> final submission, as I work out how best ot factor the code.
>>>>
>>>> mdadm doesn't current support these conversions, but you can
>>>> simply
>>>>    echo raid5 > /sys/block/md0/md/level
>>>> to change a 2-drive raid1 into a raid5.  Similarly for 5->6
>>>>         
>>> any plan for non-raid to raid1 or anything else like in windows on can
>>> convert a normal partition into a mirrored one online.
>>>       
>> No plan exactly, but I do think about it from time to time.
>>
>> There are two problems with this, and solving just one of them
>> doesn't help you much.  So you really have to solve both at once,
>> which reduces the motivation towards either ....
>>
>> One problem is the task of changing the implementation of the device
>> underneath the filesystem without the filesystem needing to care.
>>
>> i.e. the filesystem opens block device 8,1 (/dev/sda1) and starts do
>> IO, then mdadm steps in and magically changes things so that /dev/sda1
>> is now on a raid1 array which happens to access the same data, but
>> through a different code path.
>> Figuring out exactly which data structure to install the redirection
>> and how to doing in a way that is guaranteed to be safe is non-trivial.
>>
>> dm has a mechanism to change the implementation under a given dm
>> device, and md now has an mechanism to change the implementation
>> under a given md device.  But generalising that to 'any device' is
>> not entirely trivial.  Now that I have done it for md I'm in a better
>> position to understand how it might be done.
>>
>> The other problem is where to store the metadata.  You need at least a
>> few bytes and realistically 1K of space on the devices that is free to
>> be used by md to record information about device state to allow arrays to
>> be assembled correctly.
>>
>> One idea I had was to get the filesystem to allocate a block and make that
>> available to md, then md would copy the data from the last block of the
>> device into that block and redirect all IO request aim at the
>> last block so that really access the relocated block.  Then md puts
>> it's metadata in that last block.
>>
>> This could work but is a little to error prone for my liking.  e.g.
>> if you fsck the device, you suddenly loose your guarantee that
>> the filesystem isn't going to write to that relocation block.
>>
>> I think it could only work if mdadm can inspect the device and ensure
>> that the last block isn't part of any partition, or any active filesystem.
>> This is possible, but messy.
>>
>> e.g. on my notebook which has a 250Gig drive whatever I used to partition
>> it (cfdisk?) insisted on using multiples of cylinders for partitions
>> (what an out-of-date concept!) and as the reported geometry is
>>
>> Disk /dev/sda: 250.0 GB, 250059350016 bytes
>> 255 heads, 63 sectors/track, 30401 cylinders
>>
>> There are 5013 unused sectors at the end - plenty of room for
>> md to put some metadata.  But if someone else had used sfdisk,
>> I think they would find no spare space and be unhappy.
>>
>> Maybe it is sufficient to support just those people who are
>> lucky enough to not be using the whole device...
>>
>>
>> So it might happen, but it is just a little to easy to stick this
>> one in the too-hard basket.
>>     
>
> the main reason here is our life. i saw many cases where there was a
> system installed to one system and later it'd be nice to make it
> redundant (a most sysadm said: it's not working on linux it's even
> working on windows, just put into a new disk and make it mirror).
> so i don't know the technical detail, but would be a very useful feature.
>
>   
I think you can get there for normal file systems data by creating a 
raid1 on a new drive using a failed drive. Then copy the data from the 
unmirrored drive to the mirrored f/s, unmount the original drive and 
mount the array, and add the original drive to the new array. This is 
ugly, and a verified backup and restore is better, but it can be done.

-- 
Bill Davidsen <davidsen@tmr.com>
  "Woe unto the statesman who makes war without a reason that will still
  be valid when the war is over..." Otto von Bismark 



  reply	other threads:[~2009-02-13 17:02 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-02-12  3:10 [PATCH 00/18] Assorted md patches headed for 2.6.30 NeilBrown
2009-02-12  3:10 ` [PATCH 06/18] md: Represent raid device size in sectors NeilBrown
2009-02-12  3:10 ` [PATCH 02/18] md: write bitmap information to devices that are undergoing recovery NeilBrown
2009-02-12  3:10 ` [PATCH 03/18] md: occasionally checkpoint drive recovery to reduce duplicate effort after a crash NeilBrown
2009-02-12 17:26   ` John Stoffel
2009-02-13 16:20   ` Bill Davidsen
2009-02-13 16:34     ` Jon Nelson
2009-02-12  3:10 ` [PATCH 04/18] md: be more consistent about setting WriteMostly flag when adding a drive to an array NeilBrown
2009-02-12  3:10 ` [PATCH 07/18] md/raid5: simplify interface for init_stripe and get_active_stripe NeilBrown
2009-02-12  3:10 ` [PATCH 05/18] md: Make mddev->size sector-based NeilBrown
2009-02-12  3:10 ` [PATCH 08/18] md/raid5: change raid5_compute_sector and stripe_to_pdidx to take a 'previous' argument NeilBrown
2009-02-12  3:10 ` [PATCH 01/18] md: never clear bit from the write-intent bitmap when the array is degraded NeilBrown
2009-02-12  3:10 ` [PATCH 13/18] md/raid5: refactor raid5 "run" NeilBrown
2009-02-12  3:10 ` [PATCH 15/18] md: hopefully enable suspend/resume of md devices NeilBrown
2009-02-12  3:10 ` [PATCH 12/18] md/raid5: finish support for DDF/raid6 NeilBrown
2009-02-12  3:10 ` [PATCH 18/18] md/raid5: allow layout/chunksize to be changed on an active2-drive raid5 NeilBrown
2009-02-12  3:10 ` [PATCH 09/18] md/raid6: remove expectation that Q device is immediately after P device NeilBrown
2009-02-12 16:56   ` Andre Noll
2009-02-13 22:19     ` Dan Williams
2009-02-16  0:08     ` Neil Brown
2009-02-13 16:37   ` Bill Davidsen
2009-02-16  5:15     ` Neil Brown
2009-02-12  3:10 ` [PATCH 14/18] md: md_unregister_thread should cope with being passed NULL NeilBrown
2009-02-12  3:10 ` [PATCH 10/18] md/raid5: simplify raid5_compute_sector interface NeilBrown
2009-02-12  3:10 ` [PATCH 11/18] md/raid5: Add support for new layouts for raid5 and raid6 NeilBrown
2009-02-12  3:10 ` [PATCH 17/18] md: add ->takeover method for raid5 to be able to take over raid1 NeilBrown
2009-02-12  3:10 ` [PATCH 16/18] md: add ->takeover method to support changing the personality managing an array NeilBrown
2009-02-12  8:11 ` [PATCH 00/18] Assorted md patches headed for 2.6.30 Keld Jørn Simonsen
2009-02-12  9:13   ` Steve Fairbairn
2009-02-12  9:46     ` Keld Jørn Simonsen
2009-02-12 10:52       ` NeilBrown
2009-02-12 11:16         ` Keld Jørn Simonsen
2009-02-12 10:53       ` Julian Cowley
2009-02-13 16:54         ` Bill Davidsen
2009-02-16  5:35           ` Neil Brown
2009-02-16 17:31             ` Nagilum
2009-02-12 22:57     ` Dan Williams
2009-02-13 16:56     ` Bill Davidsen
2009-02-12  9:21   ` NeilBrown
2009-02-12  9:53     ` Keld Jørn Simonsen
2009-02-12 10:45       ` NeilBrown
2009-02-12 11:11         ` Keld Jørn Simonsen
2009-02-12 15:28         ` Wil Reichert
2009-02-12 17:44           ` Keld Jørn Simonsen
2009-02-12  9:42 ` Farkas Levente
2009-02-12 10:40   ` NeilBrown
2009-02-12 11:17     ` Farkas Levente
2009-02-13 17:02       ` Bill Davidsen [this message]
  -- strict thread matches above, loose matches on Subject: below --
2009-03-10  8:24 jzc-sina
     [not found] <7554605.886551236670855947.JavaMail.coremail@bj163app40.163.com>
2009-03-13  1:00 ` Neil Brown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4995A79D.9050701@tmr.com \
    --to=davidsen@tmr.com \
    --cc=lfarkas@lfarkas.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=neilb@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.