From: Phil Turmel <philip@turmel.org>
To: NeilBrown <neilb@suse.de>
Cc: linux-raid@vger.kernel.org
Subject: Re: md road-map: 2011
Date: Wed, 16 Feb 2011 19:11:14 -0500 [thread overview]
Message-ID: <4D5C67A2.5090306@turmel.org> (raw)
In-Reply-To: <20110217084403.46b4d8ee@notabene.brown>
On 02/16/2011 04:44 PM, NeilBrown wrote:
[trim /]
> On Wed, 16 Feb 2011 14:37:26 -0500 Phil Turmel <philip@turmel.org> wrote:
>> It occurred to me that if you go to the trouble (and space and performance)
>> to create and maintain metadata for lists of bad blocks, and separate
>> metadata for sync status aka "trim", or hot-replace status, or reshape-status,
>> or whatever features are dreamt up later, why not create an infrastructure to
>> carry all of it efficiently?
>>
>> David Brown suggested a multi-level metadata structure. I concur, but somewhat
>> more generic:
>> Level 1: Coarse bitmap, set bit indicates 'look at level 2'
>> Level 2: Fine bitmap, set bit indicates 'look at level 3'
>> Level 3: Extent list, with starting block, length, and feature payload
>>
>> The bitmap levels are purely for hot-path performance.
>>
>> As an option, it should be possible to spread the detailed metadata through the
>> data area, possibly in chunk-sized areas spread out at some user-defined
>> interval. "meta-span", perhaps. Then resizing partitions that compose an
>> array would be less likely to bump up against metadata size limits. The coarse
>> bitmap should stay near the superblock, of course.
>
> This is starting to sound a lot more like a filesystem than a RAID system.
Heh. But if you are going to start adding block and/or block-extent metadata for
a variety of features, common code and storage for it should be an all-around win.
> I really don't want there to be so much metadata that I am tempted to spread
> it out among the data. I think that implies too much complexity.
It would be complex, yes. Same math as computing block locations within raid 5
stripes, though.
> Maybe that is a good place to draw the line: If some metadata doesn't fit
> easily at the start of end of the devices, it has no place in RAID - you
> should add it to a filesystem instead.
I think that's arbitrary, but its moot until someone tries to implement it.
>> Personally, I'd like to see the bad-block feature actually perform block
>> remapping, much like hard drives themselves do, but with the option to unmap the
>> block if a later write succeeds. Using one retry per array restart as you
>> described makes a lot of sense. In any case, remapping would retain redundancy
>> where applicable short of full drive failure or remap overflow.
>
> If the hard drives already do this, why should md try to do it as well??
> If a hard drive has had some many write errors that it has used up all of its
> spare space, then it is long past time to replace it.
True enough.
>> My $0.02, of course.
>
> Here in .au, the smallest legal tender is $0.05 - but thanks anyway :-)
I guess the offer of "a penny for your thoughts" doesn't work down under ;)
Phil
next prev parent reply other threads:[~2011-02-17 0:11 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-02-16 10:27 md road-map: 2011 NeilBrown
2011-02-16 11:28 ` Giovanni Tessore
2011-02-16 13:40 ` Roberto Spadim
2011-02-16 14:00 ` Robin Hill
2011-02-16 14:09 ` Roberto Spadim
2011-02-16 14:21 ` Roberto Spadim
2011-02-16 21:55 ` NeilBrown
2011-02-17 1:30 ` Roberto Spadim
2011-02-16 14:13 ` Joe Landman
2011-02-16 21:24 ` NeilBrown
2011-02-16 21:44 ` Roman Mamedov
2011-02-16 21:59 ` NeilBrown
2011-02-17 0:48 ` Phil Turmel
2011-02-16 22:12 ` Joe Landman
2011-02-16 15:42 ` David Brown
2011-02-16 21:35 ` NeilBrown
2011-02-16 22:34 ` David Brown
2011-02-16 23:01 ` NeilBrown
2011-02-17 0:30 ` David Brown
2011-02-17 0:55 ` NeilBrown
2011-02-17 1:04 ` Keld Jørn Simonsen
2011-02-17 10:45 ` David Brown
2011-02-17 10:58 ` Keld Jørn Simonsen
2011-02-17 11:45 ` Giovanni Tessore
2011-02-17 15:44 ` Keld Jørn Simonsen
2011-02-17 16:22 ` Roberto Spadim
2011-02-18 0:13 ` Giovanni Tessore
2011-02-18 2:56 ` Keld Jørn Simonsen
2011-02-18 4:27 ` Roberto Spadim
2011-02-18 9:47 ` Giovanni Tessore
2011-02-18 18:43 ` Keld Jørn Simonsen
2011-02-18 19:00 ` Roberto Spadim
2011-02-18 19:18 ` Keld Jørn Simonsen
2011-02-18 19:22 ` Roberto Spadim
2011-02-16 17:20 ` Joe Landman
2011-02-16 21:36 ` NeilBrown
2011-02-16 19:37 ` Phil Turmel
2011-02-16 21:44 ` NeilBrown
2011-02-17 0:11 ` Phil Turmel [this message]
2011-02-16 20:29 ` Piergiorgio Sartor
2011-02-16 21:48 ` NeilBrown
2011-02-16 22:53 ` Piergiorgio Sartor
2011-02-17 0:24 ` Phil Turmel
2011-02-17 0:52 ` NeilBrown
2011-02-17 1:14 ` Phil Turmel
2011-02-17 3:10 ` NeilBrown
2011-02-17 18:46 ` Phil Turmel
2011-02-17 21:04 ` Mr. James W. Laferriere
2011-02-18 1:48 ` NeilBrown
2011-02-17 19:56 ` Piergiorgio Sartor
2011-02-16 22:50 ` Keld Jørn Simonsen
2011-02-23 5:06 ` Daniel Reurich
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4D5C67A2.5090306@turmel.org \
--to=philip@turmel.org \
--cc=linux-raid@vger.kernel.org \
--cc=neilb@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.