* DDF Trial Use draft specification now publicly available
@ 2004-03-11 15:45 Matt Domsch
2004-03-11 16:08 ` Samuel Davidoff
` (2 more replies)
0 siblings, 3 replies; 9+ messages in thread
From: Matt Domsch @ 2004-03-11 15:45 UTC (permalink / raw)
To: linux-raid
[-- Attachment #1: Type: text/plain, Size: 1222 bytes --]
The Disk Data Format (DDF) trial use draft spec has now been published
by SNIA. This is the common on-disk metadata format that
RAID vendors have been driving towards, which should allow one to move
disks from one vendor's RAID controller to another vendor's RAID
controller without the backup-rebuild-restore that's currently needed.
http://www.snia.org/tech_activities/ddftwg
From the doc:
Revision 00.45.00
Publication of this Trial-Use Draft Specification for trial use and
comment has been approved by the SNIA Technical Council and the Common
RAID Disk Data Format Technical Working Group. Distribution of this
draft specification for comment shall not continue beyond (2) months
from the date of publication. It is expected, but not certain that
following this (2) month period, this draft specification, revised as
necessary will be submitted to the SNIA Membership and/or Technical
Council for final approval. Suggestions for revision should be directed
to the Disk Data Format TWG at ddftwg@snia.org.
Thanks,
Matt
--
Matt Domsch
Sr. Software Engineer, Lead Engineer
Dell Linux Solutions linux.dell.com & www.dell.com/linux
Linux on Dell mailing lists @ http://lists.us.dell.com
[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: DDF Trial Use draft specification now publicly available
2004-03-11 15:45 DDF Trial Use draft specification now publicly available Matt Domsch
@ 2004-03-11 16:08 ` Samuel Davidoff
2004-03-11 16:18 ` Matt Domsch
2004-03-11 20:43 ` Jeff Garzik
2004-03-11 22:07 ` Jeff Garzik
2 siblings, 1 reply; 9+ messages in thread
From: Samuel Davidoff @ 2004-03-11 16:08 UTC (permalink / raw)
To: linux-raid
Are there plans to incorporate this into md, allowing for compatibility
between software and hardware arrays?
On a slightly off-topic note, is that at all possible now with any
md/hardware controller combinations?
> The Disk Data Format (DDF) trial use draft spec has now been published
> by SNIA. This is the common on-disk metadata format that
> RAID vendors have been driving towards, which should allow one to move
> disks from one vendor's RAID controller to another vendor's RAID
> controller without the backup-rebuild-restore that's currently needed.
>
> http://www.snia.org/tech_activities/ddftwg
>
> From the doc:
> Revision 00.45.00
> Publication of this Trial-Use Draft Specification for trial use and
> comment has been approved by the SNIA Technical Council and the Common
> RAID Disk Data Format Technical Working Group. Distribution of this
> draft specification for comment shall not continue beyond (2) months
> from the date of publication. It is expected, but not certain that
> following this (2) month period, this draft specification, revised as
> necessary will be submitted to the SNIA Membership and/or Technical
> Council for final approval. Suggestions for revision should be directed
> to the Disk Data Format TWG at ddftwg@snia.org.
>
> Thanks,
> Matt
>
> --
> Matt Domsch
> Sr. Software Engineer, Lead Engineer
> Dell Linux Solutions linux.dell.com & www.dell.com/linux
> Linux on Dell mailing lists @ http://lists.us.dell.com
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: DDF Trial Use draft specification now publicly available
2004-03-11 16:08 ` Samuel Davidoff
@ 2004-03-11 16:18 ` Matt Domsch
0 siblings, 0 replies; 9+ messages in thread
From: Matt Domsch @ 2004-03-11 16:18 UTC (permalink / raw)
To: Samuel Davidoff; +Cc: linux-raid
[-- Attachment #1: Type: text/plain, Size: 724 bytes --]
On Thu, Mar 11, 2004 at 11:08:53AM -0500, Samuel Davidoff wrote:
> Are there plans to incorporate this into md, allowing for compatibility
> between software and hardware arrays?
Scott Long and Justin Gibbs from Adaptec have been working on an
"Enhanced MD" which allows for plugging in different metadata formats
to MD, including md-traditional, DDF, and Adaptec's own format. It
was briefly discussed here a month or so ago. Unfortunately at the
time, the DDF spec wasn't publicly available yet for review and comment.
Thanks,
Matt
--
Matt Domsch
Sr. Software Engineer, Lead Engineer
Dell Linux Solutions linux.dell.com & www.dell.com/linux
Linux on Dell mailing lists @ http://lists.us.dell.com
[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: DDF Trial Use draft specification now publicly available
2004-03-11 15:45 DDF Trial Use draft specification now publicly available Matt Domsch
2004-03-11 16:08 ` Samuel Davidoff
@ 2004-03-11 20:43 ` Jeff Garzik
2004-03-12 7:27 ` Scott Long
2004-03-11 22:07 ` Jeff Garzik
2 siblings, 1 reply; 9+ messages in thread
From: Jeff Garzik @ 2004-03-11 20:43 UTC (permalink / raw)
To: Matt Domsch; +Cc: linux-raid
Matt Domsch wrote:
> The Disk Data Format (DDF) trial use draft spec has now been published
> by SNIA. This is the common on-disk metadata format that
> RAID vendors have been driving towards, which should allow one to move
> disks from one vendor's RAID controller to another vendor's RAID
> controller without the backup-rebuild-restore that's currently needed.
>
> http://www.snia.org/tech_activities/ddftwg
Thanks for posting this.
I haven't even read past "locality" section to find brokenness. DDF
RAID groups (a.k.a. each RAID array) must store information about other
RAID groups "on the controller." There is also apparently
per-controller state information one must care about. While I
understand why they would want this, this also means the format is quite
a bit less flexible than md, and in some respects, more difficult to
work with.
/me continues reading...
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: DDF Trial Use draft specification now publicly available
2004-03-11 15:45 DDF Trial Use draft specification now publicly available Matt Domsch
2004-03-11 16:08 ` Samuel Davidoff
2004-03-11 20:43 ` Jeff Garzik
@ 2004-03-11 22:07 ` Jeff Garzik
2004-03-12 7:55 ` Scott Long
2 siblings, 1 reply; 9+ messages in thread
From: Jeff Garzik @ 2004-03-11 22:07 UTC (permalink / raw)
To: linux-raid; +Cc: Matt Domsch
Ok, having read the whole spec... it reads like DDF is basically an
hardware RAID format. That's not a bad thing, just something that must
be understood from the beginning.
Thus, when considering this as a "format a hardware RAID controller
would use, to store all its global and per-array state", here are some
inherent limitations that apply to DDF, because the global state and
per-array state is considered from the viewpoint of being limited in
scope to that of a "controller" -- an entity whose definition is
amorphous at best when considering things from a Linux blkdev
perspective. Linux block devices are quite controller-independent,
leading to crazy (but flexible) combinations such as md RAID1, with 1
SCSI disk component, 1 ATA disk component, and 1 remote server using
nbd, the network block device, serving as the hot spare.
* the scope of some metadata is per-controller. there doesn't appear to
be a notion of RAID arrays that can span controllers.
* "minimal" metadata from RAID arrays B, C, D, .. might be stored in
RAID array A. This is perfectly understandable -- the hardware RAID
controller must store its controller-wide state somewhere. But it is
unclear what happens to this metadata then a RAID array is moved from
one vendor's hardware RAID to another. It is also unclear what the
behavior of a purely hardware-independent implementation should be.
* the definition/implementation of multiple controllers on a single HBA
is unclear.
* single level of stacking. To support RAID0+1 and other "stacked"
solutions, DDF includes an awful hack: "secondary RAID level." Instead
of truly supporting the notion of nested (stacked) RAID arrays, they
instead supported a single additional level of stacking - secondary raid
level. Several of the limitations related secondary RAID level are
unclear in the DDF spec.
There is also an IMO significant flaw:
The designers made a good move in choosing GUIDs. Then they turned
around and created a 32-bit identifier which is far less unique, and far
more troublesome than the GUID. Then 32-bit identifier is what is used
in the virtual disk (a.k.a. RAID array) description, when listing its
components. Argh! These identifiers need to be eliminated, and the
GUIDs that each element _already_ provides should be used as the unique
identifier of choice in all lists, etc.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: DDF Trial Use draft specification now publicly available
2004-03-11 20:43 ` Jeff Garzik
@ 2004-03-12 7:27 ` Scott Long
2004-03-12 21:07 ` Jeff Garzik
0 siblings, 1 reply; 9+ messages in thread
From: Scott Long @ 2004-03-12 7:27 UTC (permalink / raw)
To: Jeff Garzik; +Cc: Matt Domsch, linux-raid
Jeff Garzik wrote:
> Matt Domsch wrote:
> > The Disk Data Format (DDF) trial use draft spec has now been published
> > by SNIA. This is the common on-disk metadata format that
> > RAID vendors have been driving towards, which should allow one to move
> > disks from one vendor's RAID controller to another vendor's RAID
> > controller without the backup-rebuild-restore that's currently needed.
> >
> > http://www.snia.org/tech_activities/ddftwg
>
> Thanks for posting this.
>
> I haven't even read past "locality" section to find brokenness. DDF
> RAID groups (a.k.a. each RAID array) must store information about other
> RAID groups "on the controller." There is also apparently
> per-controller state information one must care about. While I
> understand why they would want this, this also means the format is quite
> a bit less flexible than md, and in some respects, more difficult to
> work with.
>
Complexity != brokenness
Recording information about all disks and arrays on every disk means
that you can detect when a whole array is missing. The more complex
array information means that you can express mutli-level arrays in a
reasonable way. We've already spoken a little bit about this. There is
also quite a bit of information in the format that allows you to
validate each disk and determine how much your 'trust' it. While this
certainly adds complexity, it also strengthens the notion that RAID is
about ensuring integrety. Of course it doesn't slow down the actual
transform of a raid-0, so you can still measure it's worthiness that
way =-)
As Matt Domsch mentioned earlier, Adaptec has been working on a DDF
implementation for Linux under MD. We are putting the final touches on
it right now (as I type this at 12:30am) and getting it ready for formal
testing. After that, we will release it to the Linux community for
review and inclusion. As an added benefit, our Enhanced MD will also
provide full (and Open) support for Adaptec HostRAID.
Scott
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: DDF Trial Use draft specification now publicly available
2004-03-11 22:07 ` Jeff Garzik
@ 2004-03-12 7:55 ` Scott Long
0 siblings, 0 replies; 9+ messages in thread
From: Scott Long @ 2004-03-12 7:55 UTC (permalink / raw)
To: Jeff Garzik; +Cc: linux-raid, Matt Domsch
Jeff Garzik wrote:
>
> Ok, having read the whole spec... it reads like DDF is basically an
> hardware RAID format. That's not a bad thing, just something that must
> be understood from the beginning.
>
That's funny, since I know of no less than two software implementations
of DDF that work fine.
> Thus, when considering this as a "format a hardware RAID controller
> would use, to store all its global and per-array state", here are some
> inherent limitations that apply to DDF, because the global state and
> per-array state is considered from the viewpoint of being limited in
> scope to that of a "controller" -- an entity whose definition is
> amorphous at best when considering things from a Linux blkdev
> perspective. Linux block devices are quite controller-independent,
> leading to crazy (but flexible) combinations such as md RAID1, with 1
> SCSI disk component, 1 ATA disk component, and 1 remote server using
> nbd, the network block device, serving as the hot spare.
>
> * the scope of some metadata is per-controller. there doesn't appear to
> be a notion of RAID arrays that can span controllers.
It's pretty non-typical for a RAID controller to support chaining.
However, the flaw that I see is that there seems to be an assumption
that there will only be one controller in a DDF domain. Thus,
Active-Active configurations that are common in the FC/SAN world make no
sense to DDF. Same goes for clustered SCSI, and SAS. It's unfortunate
that this is such a gaping hole in the spec. We'll have to see if
it's a fatal flaw or if it can be fixed in a future version of the
spec.
In the context of software raid, the definition is a little less
defined and depends only on how inter-operable you want to be. Does
a 'controller' mean the whole system, or just some logical segment
of it? Unfortunately, the linux SCSI layer provides no concept of
a 'path', so it's very, very hard to determine the topology of your
storage from software. This is a limitation of Linux, not DDF.
>
> * "minimal" metadata from RAID arrays B, C, D, .. might be stored in
> RAID array A. This is perfectly understandable -- the hardware RAID
> controller must store its controller-wide state somewhere. But it is
> unclear what happens to this metadata then a RAID array is moved from
> one vendor's hardware RAID to another. It is also unclear what the
> behavior of a purely hardware-independent implementation should be.
What parts are you unclear on? DDF is meant to be a transportable
format; you can take disks from one DDF-compliant domain to another and
it 'Just Works'. Whether the recieving domain chooses to migrate the
metadata is up to the policy of the controller so as long as the
migration retains DDF compliance.
>
> * the definition/implementation of multiple controllers on a single HBA
> is unclear.
I'm not sure what you mean here. Are you talking about multiple
controllers talking to the same disk, like I mentioned above?
>
> * single level of stacking. To support RAID0+1 and other "stacked"
> solutions, DDF includes an awful hack: "secondary RAID level." Instead
> of truly supporting the notion of nested (stacked) RAID arrays, they
> instead supported a single additional level of stacking - secondary raid
> level. Several of the limitations related secondary RAID level are
> unclear in the DDF spec.
In practice, arbitrary n-level stacking isn't terribly useful. Yes, it
would be nice if it cleanly supported this, but you've already
complained that the spec is too complex as it is ;-)
>
>
> There is also an IMO significant flaw:
>
> The designers made a good move in choosing GUIDs. Then they turned
> around and created a 32-bit identifier which is far less unique, and far
> more troublesome than the GUID. Then 32-bit identifier is what is used
> in the virtual disk (a.k.a. RAID array) description, when listing its
> components. Argh! These identifiers need to be eliminated, and the
> GUIDs that each element _already_ provides should be used as the unique
> identifier of choice in all lists, etc.
Yeah, it is strange that they mix GUIDs and Unique Identifiers, but it
is not a fatal flaw. The GUID can be used to resolve collisions of
the Unique Id, since you know about every disk in your domain.
Anyways, DDF is certainly not perfect, but it's decent for the target
market and not terribly difficult to implement in software.
Scott
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: DDF Trial Use draft specification now publicly available
2004-03-12 7:27 ` Scott Long
@ 2004-03-12 21:07 ` Jeff Garzik
2004-03-12 22:27 ` Scott Long
0 siblings, 1 reply; 9+ messages in thread
From: Jeff Garzik @ 2004-03-12 21:07 UTC (permalink / raw)
To: Scott Long; +Cc: Matt Domsch, linux-raid
Scott Long wrote:
> Complexity != brokenness
Agreed, but complexity is also something that is one of the key things
to resist... Favorite Linus maxims include "don't overdesign" and "do
what you need to do, and no more" ;-)
> Recording information about all disks and arrays on every disk means
> that you can detect when a whole array is missing. The more complex
> array information means that you can express mutli-level arrays in a
> reasonable way. We've already spoken a little bit about this. There is
I think you're missing an overall point about scope. DDF configuration
is not global, from the standpoint of a Linux system. md's
configuration (raidtab, etc) is.
Therefore, from the standpoint from the entire Linux system, the
definition "all disks and arrays" varies from one RAID "domain" to another.
This artificial partitioning into domains is obviously required for
hardware RAID -- a single controller does not know anything more than
what spindles are attached to it.
But... this partitioning, the _initialization and ordering of domains_,
varies from controller to controller, and sometimes AFAICS from one
setup to setup. The Linux kernel now has to organize all that into a
coherent picture.
"Getting RAID domains right" is where I see a lot of complexity... The
simplicity of md's raidtab provides the same thing in userspace -- just
list the ordering in a text file. The kernel only really cares about
auto-running the RAID upon which your root filesystem (and raidtab) lives.
> also quite a bit of information in the format that allows you to
> validate each disk and determine how much your 'trust' it. While this
> certainly adds complexity, it also strengthens the notion that RAID is
> about ensuring integrety. Of course it doesn't slow down the actual
> transform of a raid-0, so you can still measure it's worthiness that
> way =-)
Yeah, the basic r/w fast path isn't too tough, it's not only error
handling but also the minute variations in RAID formats where things
gets fun :)
> As Matt Domsch mentioned earlier, Adaptec has been working on a DDF
> implementation for Linux under MD. We are putting the final touches on
> it right now (as I type this at 12:30am) and getting it ready for formal
> testing. After that, we will release it to the Linux community for
> review and inclusion. As an added benefit, our Enhanced MD will also
> provide full (and Open) support for Adaptec HostRAID.
Yep, looking forward to it :)
Jeff
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: DDF Trial Use draft specification now publicly available
2004-03-12 21:07 ` Jeff Garzik
@ 2004-03-12 22:27 ` Scott Long
0 siblings, 0 replies; 9+ messages in thread
From: Scott Long @ 2004-03-12 22:27 UTC (permalink / raw)
To: Jeff Garzik; +Cc: Matt Domsch, linux-raid
Jeff Garzik wrote:
> Scott Long wrote:
> > Complexity != brokenness
>
> Agreed, but complexity is also something that is one of the key things
> to resist... Favorite Linus maxims include "don't overdesign" and "do
> what you need to do, and no more" ;-)
Doing the job right might lead to complexity. RAID is about doing the
job right. The 'R' in RAID doesn't stand for 'Fast' or 'Simple'.
>
>
> > Recording information about all disks and arrays on every disk means
> > that you can detect when a whole array is missing. The more complex
> > array information means that you can express mutli-level arrays in a
> > reasonable way. We've already spoken a little bit about this. There is
>
> I think you're missing an overall point about scope. DDF configuration
> is not global, from the standpoint of a Linux system. md's
> configuration (raidtab, etc) is.
>
> Therefore, from the standpoint from the entire Linux system, the
> definition "all disks and arrays" varies from one RAID "domain" to another.
>
> This artificial partitioning into domains is obviously required for
> hardware RAID -- a single controller does not know anything more than
> what spindles are attached to it.
>
> But... this partitioning, the _initialization and ordering of domains_,
> varies from controller to controller, and sometimes AFAICS from one
> setup to setup. The Linux kernel now has to organize all that into a
> coherent picture.
>
The GUID _is_ unique, and is the authority. The spec even details how
it is to be made unique so that it (hopefully) never collides between
domains. Great care is taken in the spec to allow disks and arrays
to migrate between domains. The only flaw is that it doesn't allow
the concept of overlapped domains, or disks with multiple parents. But
that really isn't what you are talking about here.
As for domain scope within linux, it is a flaw of linux that the raid
raid stack can't reliably get pathing information to figure out it's
topology (and thus it's domain).
> "Getting RAID domains right" is where I see a lot of complexity... The
> simplicity of md's raidtab provides the same thing in userspace -- just
> list the ordering in a text file. The kernel only really cares about
> auto-running the RAID upon which your root filesystem (and raidtab) lives.
Linux raidtab doesn't allow you to boot off of your array. Yes, there
are some hacks out there to boot off of a single disk of a mirror and
let MD/DM/whatever take over and supplant it with the real raid device.
That doesn't do anything for you when you are talking about RAID0 or
RAID10.
>
>
> > also quite a bit of information in the format that allows you to
> > validate each disk and determine how much your 'trust' it. While this
> > certainly adds complexity, it also strengthens the notion that RAID is
> > about ensuring integrety. Of course it doesn't slow down the actual
> > transform of a raid-0, so you can still measure it's worthiness that
> > way =-)
>
> Yeah, the basic r/w fast path isn't too tough, it's not only error
> handling but also the minute variations in RAID formats where things
> gets fun :)
>
The role of the metadata is pretty insigificant in the scope of error
handling. 95% of the error handling code that we have in Enhanced MD
is independent of the metadata; the metadata code is only there to
record the state changes and decide if one state change leads to
another.
Scott
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2004-03-12 22:27 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-03-11 15:45 DDF Trial Use draft specification now publicly available Matt Domsch
2004-03-11 16:08 ` Samuel Davidoff
2004-03-11 16:18 ` Matt Domsch
2004-03-11 20:43 ` Jeff Garzik
2004-03-12 7:27 ` Scott Long
2004-03-12 21:07 ` Jeff Garzik
2004-03-12 22:27 ` Scott Long
2004-03-11 22:07 ` Jeff Garzik
2004-03-12 7:55 ` Scott Long
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).