All of lore.kernel.org
 help / color / mirror / Atom feed
From: Boaz Harrosh <bharrosh@panasas.com>
To: lsf-pc@lists.linux-foundation.org,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	linux-scsi <linux-scsi@vger.kernel.org>
Cc: "Bhamare, Sachin" <sbhamare@panasas.com>,
	Benny Halevy <bhalevy@tonian.com>,
	"Welch, Brent" <welch@panasas.com>
Subject: [LSF/MM TOPIC] yet another RAID engine
Date: Fri, 3 Feb 2012 02:29:56 +0200	[thread overview]
Message-ID: <4F2B2A84.8010205@panasas.com> (raw)

I have mentioned this in the past and people might be interested. If not please ignore.

In Kernel 3.2 I introduced RAID5 support into the ORE. Objects Raid Engine

The ORE is a pretty compact code that has a simple interface of:
- Given a RAID layout description. And a device table
- Page-array or BIO supplied on top
- Will produce a set of BIOs on the bottom one for each device.

The same generic engine has support for many RAID topologies. I might say
that it supports all the existing topologies that I've encountered and
some more that are not.

It's a 3 level RAID topology.
- The bottom most level have one or more copies (mirrors).
- The middle level supports striping (raid0), raid4, raid5, raid6,
- The Top level supports striping over device groups. Or what we call
  raid groups.

So all the combination of raid10/50/501/51/60/61 and so on and so forth
are simple and much more.

This is all pretty much old news, OK it might have a bit more, But there is one
advantage to the ORE that does not exist in any other system:

Since the same code changes it's output according to a layout descriptor.
The topology is *no longer static*. Directories can be mirrors, large files
raid5, small files raid10, /tmp raid0. Inner disk smaller stripes, outer
larger, you name it. Every write request can have it's own topology.
Also the topology can be dynamic. More devices add/removed online

Currently at the bottom level the BIOs are pushed into T10 OSD objects hard coded.
But this can change into an IOer-function-vector and be fed to block-devices,
fs-inodes, ... What ever wants to implement the simple dev_read_bio/dev_write_bio
API.

I see two immediate possible candidates to the ORE.
  - One is an md replacement for static multi-devices topology.
  - Second is BTRFS which wanted a RAID5 RAID6 implementation and last I asked does not have one.
    This could fit well into their structures.

OK, now did I just shoot myself in the foot?

Thanks
Boaz

WARNING: multiple messages have this Message-ID (diff)
From: Boaz Harrosh <bharrosh@panasas.com>
To: <lsf-pc@lists.linux-foundation.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	linux-scsi <linux-scsi@vger.kernel.org>
Cc: "Bhamare, Sachin" <sbhamare@panasas.com>,
	Benny Halevy <bhalevy@tonian.com>,
	"Welch, Brent" <welch@panasas.com>
Subject: [LSF/MM TOPIC] yet another RAID engine
Date: Fri, 3 Feb 2012 02:29:56 +0200	[thread overview]
Message-ID: <4F2B2A84.8010205@panasas.com> (raw)

I have mentioned this in the past and people might be interested. If not please ignore.

In Kernel 3.2 I introduced RAID5 support into the ORE. Objects Raid Engine

The ORE is a pretty compact code that has a simple interface of:
- Given a RAID layout description. And a device table
- Page-array or BIO supplied on top
- Will produce a set of BIOs on the bottom one for each device.

The same generic engine has support for many RAID topologies. I might say
that it supports all the existing topologies that I've encountered and
some more that are not.

It's a 3 level RAID topology.
- The bottom most level have one or more copies (mirrors).
- The middle level supports striping (raid0), raid4, raid5, raid6,
- The Top level supports striping over device groups. Or what we call
  raid groups.

So all the combination of raid10/50/501/51/60/61 and so on and so forth
are simple and much more.

This is all pretty much old news, OK it might have a bit more, But there is one
advantage to the ORE that does not exist in any other system:

Since the same code changes it's output according to a layout descriptor.
The topology is *no longer static*. Directories can be mirrors, large files
raid5, small files raid10, /tmp raid0. Inner disk smaller stripes, outer
larger, you name it. Every write request can have it's own topology.
Also the topology can be dynamic. More devices add/removed online

Currently at the bottom level the BIOs are pushed into T10 OSD objects hard coded.
But this can change into an IOer-function-vector and be fed to block-devices,
fs-inodes, ... What ever wants to implement the simple dev_read_bio/dev_write_bio
API.

I see two immediate possible candidates to the ORE.
  - One is an md replacement for static multi-devices topology.
  - Second is BTRFS which wanted a RAID5 RAID6 implementation and last I asked does not have one.
    This could fit well into their structures.

OK, now did I just shoot myself in the foot?

Thanks
Boaz

             reply	other threads:[~2012-02-03  0:30 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-02-03  0:29 Boaz Harrosh [this message]
2012-02-03  0:29 ` [LSF/MM TOPIC] yet another RAID engine Boaz Harrosh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4F2B2A84.8010205@panasas.com \
    --to=bharrosh@panasas.com \
    --cc=bhalevy@tonian.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=sbhamare@panasas.com \
    --cc=welch@panasas.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.