All of lore.kernel.org
 help / color / mirror / Atom feed
From: Bill Davidsen <davidsen@tmr.com>
To: Bill Davidsen <davidsen@tmr.com>
Cc: Neil Brown <neilb@suse.de>,
	david@lang.hm, linux-kernel@vger.kernel.org,
	linux-raid@vger.kernel.org
Subject: Re: limits on raid
Date: Thu, 21 Jun 2007 19:03:29 -0400	[thread overview]
Message-ID: <467B03C1.50809@tmr.com> (raw)
In-Reply-To: <46756BE2.7010401@tmr.com>

I didn't get a comment on my suggestion for a quick and dirty fix for 
-assume-clean issues...

Bill Davidsen wrote:
> Neil Brown wrote:
>> On Thursday June 14, david@lang.hm wrote:
>>  
>>> it's now churning away 'rebuilding' the brand new array.
>>>
>>> a few questions/thoughts.
>>>
>>> why does it need to do a rebuild when makeing a new array? couldn't 
>>> it just zero all the drives instead? (or better still just record 
>>> most of the space as 'unused' and initialize it as it starts useing 
>>> it?)
>>>     
>>
>> Yes, it could zero all the drives first.  But that would take the same
>> length of time (unless p/q generation was very very slow), and you
>> wouldn't be able to start writing data until it had finished.
>> You can "dd" /dev/zero onto all drives and then create the array with
>> --assume-clean if you want to.  You could even write a shell script to
>> do it for you.
>>
>> Yes, you could record which space is used vs unused, but I really
>> don't think the complexity is worth it.
>>
>>   
> How about a simple solution which would get an array on line and still 
> be safe? All it would take is a flag which forced reconstruct writes 
> for RAID-5. You could set it with an option, or automatically if 
> someone puts --assume-clean with --create, leave it in the superblock 
> until the first "repair" runs to completion. And for repair you could 
> make some assumptions about bad parity not being caused by error but 
> just unwritten.
>
> Thought 2: I think the unwritten bit is easier than you think, you 
> only need it on parity blocks for RAID5, not on data blocks. When a 
> write is done, if the bit is set do a reconstruct, write the parity 
> block, and clear the bit. Keeping a bit per data block is madness, and 
> appears to be unnecessary as well.
>>> while I consider zfs to be ~80% hype, one advantage it could have 
>>> (but I don't know if it has) is that since the filesystem an raid 
>>> are integrated into one layer they can optimize the case where files 
>>> are being written onto unallocated space and instead of reading 
>>> blocks from disk to calculate the parity they could just put zeros 
>>> in the unallocated space, potentially speeding up the system by 
>>> reducing the amount of disk I/O.
>>>     
>>
>> Certainly.  But the raid doesn't need to be tightly integrated
>> into the filesystem to achieve this.  The filesystem need only know
>> the geometry of the RAID and when it comes to write, it tries to write
>> full stripes at a time.  If that means writing some extra blocks full
>> of zeros, it can try to do that.  This would require a little bit
>> better communication between filesystem and raid, but not much.  If
>> anyone has a filesystem that they want to be able to talk to raid
>> better, they need only ask...
>>  
>>  
>>> is there any way that linux would be able to do this sort of thing? 
>>> or is it impossible due to the layering preventing the nessasary 
>>> knowledge from being in the right place?
>>>     
>>
>> Linux can do anything we want it to.  Interfaces can be changed.  All
>> it takes is a fairly well defined requirement, and the will to make it
>> happen (and some technical expertise, and lots of time .... and
>> coffee?).
>>   
> Well, I gave you two thoughts, one which would be slow until a repair 
> but sounds easy to do, and one which is slightly harder but works 
> better and minimizes performance impact.
>


-- 
bill davidsen <davidsen@tmr.com>
  CTO TMR Associates, Inc
  Doing interesting things with small computers since 1979

  reply	other threads:[~2007-06-21 23:03 UTC|newest]

Thread overview: 69+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-06-15  2:58 limits on raid david
2007-06-15  3:05 ` Neil Brown
2007-06-15  3:43   ` david
2007-06-15  3:58     ` Neil Brown
2007-06-15  9:13       ` David Chinner
2007-06-15 22:21         ` Neil Brown
2007-06-15 11:10       ` Avi Kivity
2007-06-15 16:23         ` Jan Engelhardt
2007-06-15 17:20           ` Avi Kivity
2007-06-15 21:59         ` Neil Brown
2007-06-16 17:23           ` Avi Kivity
2007-06-17 13:00           ` Andi Kleen
2007-06-18  4:57           ` David Chinner
2007-06-21  2:56             ` Neil Brown
2007-06-21  6:39               ` David Chinner
2007-06-21  6:45                 ` david
2007-06-21  8:59                   ` David Greaves
2007-06-21 17:00                   ` Mark Lord
2007-06-21 11:00                 ` David Chinner
2007-06-21 12:40               ` Mattias Wadenstein
2007-06-21 14:40                 ` Justin Piszcz
2007-06-21 16:48                 ` david
2007-06-21 18:30                 ` Martin K. Petersen
2007-06-21 20:08               ` Nix
2007-06-16  2:03       ` Wakko Warner
2007-06-16  3:47         ` Neil Brown
2007-06-16  4:40           ` Dan Merillat
2007-06-16  7:48           ` david
2007-06-16 13:38             ` David Greaves
2007-06-16 17:16               ` david
2007-06-17 17:16             ` Bill Davidsen
2007-06-18 17:20             ` Brendan Conoboy
2007-06-18 17:28               ` david
2007-06-18 18:03                 ` Lennart Sorensen
2007-06-18 18:12                   ` david
2007-06-18 18:33                     ` Lennart Sorensen
2007-06-18 18:40                       ` david
2007-06-18 19:11                         ` Brendan Conoboy
2007-06-18 20:52                           ` david
2007-06-18 21:46                             ` Wakko Warner
2007-06-18 21:56                               ` david
2007-06-18 22:00                                 ` Brendan Conoboy
2007-06-19 20:11                                 ` Lennart Sorensen
2007-06-19 20:51                                   ` david
2007-06-19 15:07                             ` Phillip Susi
2007-06-19 19:28                               ` david
2007-06-18 18:07                 ` Brendan Conoboy
2007-06-18 18:16                   ` david
2007-06-16 13:33           ` David Greaves
2007-06-17  1:44             ` dean gaudet
2007-06-21  3:01             ` Neil Brown
2007-06-21  8:49               ` David Greaves
2007-06-16 14:08           ` Wakko Warner
2007-06-17  1:47             ` dean gaudet
2007-06-17 13:28               ` Wakko Warner
2007-06-17 17:28                 ` dean gaudet
2007-06-17 19:30                   ` Wakko Warner
2007-06-17 19:54                     ` dean gaudet
2007-06-17 20:46                       ` david
2007-06-17 20:44                     ` david
2007-06-17 17:14       ` Bill Davidsen
2007-06-21 23:03         ` Bill Davidsen [this message]
2007-06-22  2:24           ` Neil Brown
2007-06-22  8:10             ` David Greaves
2007-06-22  9:51               ` david
2007-06-22 12:39                 ` David Greaves
2007-06-22 16:00                   ` Bill Davidsen
2007-06-22 16:55                     ` David Greaves
2007-06-22 18:41                     ` david

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=467B03C1.50809@tmr.com \
    --to=davidsen@tmr.com \
    --cc=david@lang.hm \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=neilb@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.