Re: Split RAID: Proposal for archival RAID using incremental batch checksum

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Ethan Wilson <ethan.wilson@shiftmail.org>
To: linux-raid@vger.kernel.org
Subject: Re: Split RAID: Proposal for archival RAID using incremental batch checksum
Date: Wed, 29 Oct 2014 20:27:52 +0100	[thread overview]
Message-ID: <54513FB8.2050407@shiftmail.org> (raw)
In-Reply-To: <CAK-d5dah-NyQzVNBScYoVSo2cpGA8F3vuK_Zh1YzQn5Mr+_-oQ@mail.gmail.com>

On 29/10/2014 10:25, Anshuman Aggarwal wrote:
> Right on most counts but please see comments below.
>
> On 29 October 2014 14:35, NeilBrown <neilb@suse.de> wrote:
>> Just to be sure I understand, you would have N + X devices.  Each of the N
>> devices contains an independent filesystem and could be accessed directly if
>> needed.  Each of the X devices contains some codes so that if at most X
>> devices in total died, you would still be able to recover all of the data.
>> If more than X devices failed, you would still get complete data from the
>> working devices.
>>
>> Every update would only write to the particular N device on which it is
>> relevant, and  all of the X devices.  So N needs to be quite a bit bigger
>> than X for the spin-down to be really worth it.
>>
>> Am I right so far?
> Perfectly right so far. I typically have a N to X ratio of 4 (4
> devices to 1 data) so spin down is totally worth it for data
> protection but more on that below.
>
>> For some reason the writes to X are delayed...  I don't really understand
>> that part.
> This delay is basically designed around archival devices which are
> rarely read from and even more rarely written to. By delaying writes
> on 2 criteria ( designated cache buffer filling up or preset time
> duration from last write expiring) we can significantly reduce the
> writes on the parity device. This assumes that we are ok to lose a
> movie or two in case the parity disk is not totally up to date but are
> more interested in device longevity.
>
>> Sounds like multi-parity RAID6 with no parity rotation and
>>    chunksize == devicesize
> RAID6 would present us with a joint device and currently only allows
> writes to that directly, yes? Any writes will be striped.

I am not totally sure I understand your design, but it seems to me that 
the following solution could work for you:

MD raid-6, maybe multi-parity (multi-parity not implemented yet in MD 
yet, but just do a periodic scrub and 2 parities can be fine. Wake-up is 
not so expensive that you can't scrub)

Over that you put a raid1 of 2 x 4TB disks as a bcache cache device 
(those two will never spin-down) in writeback mode with 
writeback_running=off . This will prevent writes to backend and leave 
the backend array spun down.
When bcache is almost full (poll dirty_data), switch to 
writeback_running=on and writethrough: it will wake up the backend raid6 
array and flush all dirty data. You can then then revert to writeback 
and writeback_running=off. After this you can spin-down the backend 
array again.

You also get read caching for free, which helps the backend array to 
stay spun down as much as possible.

Maybe you can modify bcache slightly so to implement an automatic 
switching between the modes as described above, instead of polling the 
state from outside.

Would that work, or you are asking something different?

EW

next prev parent reply	other threads:[~2014-10-29 19:27 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-10-29  7:15 Split RAID: Proposal for archival RAID using incremental batch checksum Anshuman Aggarwal
2014-10-29  7:32 ` Roman Mamedov
2014-10-29  8:31   ` Anshuman Aggarwal
2014-10-29  9:05 ` NeilBrown
2014-10-29  9:25   ` Anshuman Aggarwal
2014-10-29 19:27     ` Ethan Wilson [this message]
2014-10-30 14:57       ` Anshuman Aggarwal
2014-10-30 17:25         ` Piergiorgio Sartor
2014-10-31 11:05           ` Anshuman Aggarwal
2014-10-31 14:25             ` Matt Garman
2014-11-01 12:55             ` Piergiorgio Sartor
2014-11-06  2:29               ` Anshuman Aggarwal
2014-10-30 15:00     ` Anshuman Aggarwal
2014-11-03  5:52       ` NeilBrown
2014-11-03 18:04         ` Piergiorgio Sartor
2014-11-06  2:24         ` Anshuman Aggarwal
2014-11-24  7:29         ` Anshuman Aggarwal
2014-11-24 22:50           ` NeilBrown
2014-11-26  6:24             ` Anshuman Aggarwal
2014-12-01 16:00               ` Anshuman Aggarwal
2014-12-01 16:34                 ` Anshuman Aggarwal
2014-12-01 21:46                   ` NeilBrown
2014-12-02 11:56                     ` Anshuman Aggarwal
2014-12-16 16:25                       ` Anshuman Aggarwal
2014-12-16 21:49                         ` NeilBrown
2014-12-17  6:40                           ` Anshuman Aggarwal
2015-01-06 11:40                             ` Anshuman Aggarwal
     [not found] ` <CAJvUf-BktH_E6jb5d94VuMVEBf_Be4i_8u_kBYU52Df1cu0gmg@mail.gmail.com>
2014-11-01  5:36   ` Anshuman Aggarwal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54513FB8.2050407@shiftmail.org \
    --to=ethan.wilson@shiftmail.org \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).