* Re: Proposed enhancements to MD [not found] <40033D02.8000207@adaptec.com> @ 2004-01-13 18:44 ` Jeff Garzik 2004-01-13 19:01 ` John Bradford ` (3 more replies) 2004-01-14 23:07 ` Neil Brown 1 sibling, 4 replies; 33+ messages in thread From: Jeff Garzik @ 2004-01-13 18:44 UTC (permalink / raw) To: Scott Long; +Cc: Linux Kernel, linux-raid, Neil Brown Scott Long wrote: > I'm going to push these changes out in phases in order to keep the risk > and churn to a minimum. The attached patch is for the partition > support. It was originally from Ingo Molnar, but has changed quite a > bit due to the radical changes in the disk/block layer in 2.6. The 2.4 > version works quite well, while the 2.6 version is fairly fresh. One > problem that I have with it is that the created partitions show up in > /proc/partitions after running fdisk, but not after a reboot. You sorta hit a bad time for 2.4 development. Even though my employer (Red Hat), Adaptec, and many others must continue to support new products on 2.4.x kernels, kernel development has shifted to 2.6.x (and soon 2.7.x). In general, you want a strategy of "develop on latest, then backport if needed." Once a solution is merged into the latest kernel, it automatically appears in many companies' products (and perhaps more importantly) product roadmaps. Otherwise you will design various things into your software that have already been handled different in the future, thus creating an automatically-obsolete solution and support nightmare. Now, addressing your specific issues... > hile MD is fairly functional and clean, there are a number of enhancements to it that we have been working on for a while and would > like to push out to the community for review and integration. These > include: > > - partition support for md devices: MD does not support the concept of > fdisk partitions; the only way to approximate this right now is by > creating multiple arrays on the same media. Fixing this is required > for not only feature-completeness, but to allow our BIOS to recognise > the partitions on an array and properly boot them as it would boot a > normal disk. Neil Brown already done a significant amount of research into this topic. Given this, and his general status as md maintainer, you should definitely make sure he's kept in the loop. Partitioning for md was discussed in this thread: http://lkml.org/lkml/2003/11/13/182 In particular note Al Viro's response to Neil, in addition to Neil's own post. And I could have _sworn_ that Neil already posted a patch to do partitions in md, but maybe my memory is playing tricks on me. > - generic device arrival notification mechanism: This is needed to > support device hot-plug, and allow arrays to be automatically > configured regardless of when the md module is loaded or initialized. > RedHat EL3 has a scaled down version of this already, but it is > specific to MD and only works if MD is statically compiled into the > kernel. A general mechanism will benefit MD as well as any other > storage system that wants hot-arrival notices. This would be via /sbin/hotplug, in the Linux world. SCSI already does this, I think, so I suppose something similar would happen for md. > - RAID-0 fixes: The MD RAID-0 personality is unable to perform I/O > that spans a chunk boundary. Modifications are needed so that it can > take a request and break it up into 1 or more per-disk requests. I thought that raid0 was one of the few that actually did bio splitting correctly? Hum, maybe this is a 2.4-only issue. Interesting, and agreed, if so... > - Metadata abstraction: We intend to support multiple on-disk metadata > formats, along with the 'native MD' format. To do this, specific > knowledge of MD on-disk structures must be abstracted out of the core > and personalities modules. > - DDF Metadata support: Future products will use the 'DDF' on-disk > metadata scheme. These products will be bootable by the BIOS, but > must have DDF support in the OS. This will plug into the abstraction > mentioned above. Neil already did the work to make 'md' support multiple types of superblocks, but I'm not sure if we want to hack 'md' to support the various vendor RAIDs out there. DDF support we _definitely_ want, of course. DDF follows a very nice philosophy: open[1] standard with no vendor lock-in. IMO, your post/effort all boils down to an open design question: device mapper or md, for doing stuff like vendor-raid1 or vendor-raid5? And it is even possible to share (for example) raid5 engine among all the various vendor RAID5's? Jeff [1] well, developed in secret, but published openly. Not quite up to Linux's standards, but decent for the h/w world. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Proposed enhancements to MD 2004-01-13 18:44 ` Proposed enhancements to MD Jeff Garzik @ 2004-01-13 19:01 ` John Bradford 2004-01-13 19:41 ` Matt Domsch ` (2 subsequent siblings) 3 siblings, 0 replies; 33+ messages in thread From: John Bradford @ 2004-01-13 19:01 UTC (permalink / raw) To: Jeff Garzik, Scott Long; +Cc: Linux Kernel, linux-raid, Neil Brown > [1] well, developed in secret, but published openly. Not quite up to > Linux's standards, but decent for the h/w world. ..and patent-free? John. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Proposed enhancements to MD 2004-01-13 18:44 ` Proposed enhancements to MD Jeff Garzik 2004-01-13 19:01 ` John Bradford @ 2004-01-13 19:41 ` Matt Domsch 2004-01-13 22:10 ` Arjan van de Ven 2004-01-16 9:31 ` Lars Marowsky-Bree 2004-01-13 20:41 ` Scott Long 2004-01-13 22:42 ` Luca Berra 3 siblings, 2 replies; 33+ messages in thread From: Matt Domsch @ 2004-01-13 19:41 UTC (permalink / raw) To: Jeff Garzik; +Cc: Scott Long, Linux Kernel, linux-raid, Neil Brown On Tue, Jan 13, 2004 at 01:44:05PM -0500, Jeff Garzik wrote: > You sorta hit a bad time for 2.4 development. Even though my employer > (Red Hat), Adaptec, and many others must continue to support new > products on 2.4.x kernels, Indeed, enterprise class products based on 2.4.x kernels will need some form of solution here too. > kernel development has shifted to 2.6.x (and soon 2.7.x). > > In general, you want a strategy of "develop on latest, then backport if > needed." Ideally in 2.6 one can use device mapper, but DM hasn't been incorporated into 2.4 stock, I know it's not in RHEL 3, and I don't believe it's included in SLES8. Can anyone share thoughts on if a DDF solution were built on top of DM, that DM could be included in 2.4 stock, RHEL3, or SLES8? Otherwise, Adaptec will be stuck with two different solutions anyhow, one for 2.4 (they're proposing enhancing MD), and DM for 2.6. Thanks, Matt -- Matt Domsch Sr. Software Engineer, Lead Engineer Dell Linux Solutions www.dell.com/linux Linux on Dell mailing lists @ http://lists.us.dell.com ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Proposed enhancements to MD 2004-01-13 19:41 ` Matt Domsch @ 2004-01-13 22:10 ` Arjan van de Ven 2004-01-16 9:31 ` Lars Marowsky-Bree 1 sibling, 0 replies; 33+ messages in thread From: Arjan van de Ven @ 2004-01-13 22:10 UTC (permalink / raw) To: Matt Domsch; +Cc: Jeff Garzik, Scott Long, Linux Kernel, linux-raid, Neil Brown [-- Attachment #1: Type: text/plain, Size: 683 bytes --] > Ideally in 2.6 one can use device mapper, but DM hasn't been > incorporated into 2.4 stock, I know it's not in RHEL 3, and I don't > believe it's included in SLES8. Can anyone share thoughts on if a DDF > solution were built on top of DM, that DM could be included in 2.4 > stock, RHEL3, or SLES8? Otherwise, Adaptec will be stuck with two > different solutions anyhow, one for 2.4 (they're proposing enhancing > MD), and DM for 2.6. Well it's either putting DM into 2.4 or forcing some sort of partitioned MD into 2.4. My strong preference would be DM in that cases since it's already in 2.6 and is actually designed for the multiple-superblock-formats case. [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Proposed enhancements to MD 2004-01-13 19:41 ` Matt Domsch 2004-01-13 22:10 ` Arjan van de Ven @ 2004-01-16 9:31 ` Lars Marowsky-Bree 2004-01-16 9:57 ` Arjan van de Ven 1 sibling, 1 reply; 33+ messages in thread From: Lars Marowsky-Bree @ 2004-01-16 9:31 UTC (permalink / raw) To: Matt Domsch, Jeff Garzik; +Cc: Scott Long, Linux Kernel, linux-raid, Neil Brown On 2004-01-13T13:41:07, Matt Domsch <Matt_Domsch@dell.com> said: > > You sorta hit a bad time for 2.4 development. Even though my employer > > (Red Hat), Adaptec, and many others must continue to support new > > products on 2.4.x kernels, > Indeed, enterprise class products based on 2.4.x kernels will need > some form of solution here too. Yes, namely not supporting this feature and moving onwards to 2.6 in their next release ;-) Sincerely, Lars Marowsky-Brée <lmb@suse.de> -- High Availability & Clustering \ ever tried. ever failed. no matter. SUSE Labs | try again. fail again. fail better. Research & Development, SUSE LINUX AG \ -- Samuel Beckett - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Proposed enhancements to MD 2004-01-16 9:31 ` Lars Marowsky-Bree @ 2004-01-16 9:57 ` Arjan van de Ven 0 siblings, 0 replies; 33+ messages in thread From: Arjan van de Ven @ 2004-01-16 9:57 UTC (permalink / raw) To: Lars Marowsky-Bree Cc: Matt Domsch, Jeff Garzik, Scott Long, Linux Kernel, linux-raid, Neil Brown [-- Attachment #1: Type: text/plain, Size: 546 bytes --] On Fri, 2004-01-16 at 10:31, Lars Marowsky-Bree wrote: > On 2004-01-13T13:41:07, > Matt Domsch <Matt_Domsch@dell.com> said: > > > > You sorta hit a bad time for 2.4 development. Even though my employer > > > (Red Hat), Adaptec, and many others must continue to support new > > > products on 2.4.x kernels, > > Indeed, enterprise class products based on 2.4.x kernels will need > > some form of solution here too. > > Yes, namely not supporting this feature and moving onwards to 2.6 in > their next release ;-) hear hear [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Proposed enhancements to MD 2004-01-13 18:44 ` Proposed enhancements to MD Jeff Garzik 2004-01-13 19:01 ` John Bradford 2004-01-13 19:41 ` Matt Domsch @ 2004-01-13 20:41 ` Scott Long 2004-01-13 22:33 ` Jure Pečar 2004-01-14 15:52 ` Kevin Corry 2004-01-13 22:42 ` Luca Berra 3 siblings, 2 replies; 33+ messages in thread From: Scott Long @ 2004-01-13 20:41 UTC (permalink / raw) To: Jeff Garzik; +Cc: Linux Kernel, linux-raid, Neil Brown Jeff Garzik wrote: > Scott Long wrote: > >>I'm going to push these changes out in phases in order to keep the > > risk > >>and churn to a minimum. The attached patch is for the partition >>support. It was originally from Ingo Molnar, but has changed quite a >>bit due to the radical changes in the disk/block layer in 2.6. The > > 2.4 > >>version works quite well, while the 2.6 version is fairly fresh. One >>problem that I have with it is that the created partitions show up in >>/proc/partitions after running fdisk, but not after a reboot. > > > You sorta hit a bad time for 2.4 development. Even though my employer > (Red Hat), Adaptec, and many others must continue to support new > products on 2.4.x kernels, kernel development has shifted to 2.6.x (and > soon 2.7.x). > > In general, you want a strategy of "develop on latest, then backport if > needed." Once a solution is merged into the latest kernel, it > automatically appears in many companies' products (and perhaps more > importantly) product roadmaps. Otherwise you will design various things > > into your software that have already been handled different in the > future, thus creating an automatically-obsolete solution and support > nightmare. > Oh, I understand completely. This work has actually been going on for a number of years in an on-and-off fashion. I'm just the latest person to pick it up, and I happened to pick it up right when the big transition to 2.6 happened. > Now, addressing your specific issues... > > >>hile MD is fairly functional and clean, there are a number of > > enhancements to it that we have been working on for a while and would > >>like to push out to the community for review and integration. These >>include: >> >>- partition support for md devices: MD does not support the concept > > of > >> fdisk partitions; the only way to approximate this right now is by >> creating multiple arrays on the same media. Fixing this is required >> for not only feature-completeness, but to allow our BIOS to > > recognise > >> the partitions on an array and properly boot them as it would boot a >> normal disk. > > > Neil Brown already done a significant amount of research into this > topic. Given this, and his general status as md maintainer, you should > definitely make sure he's kept in the loop. > > Partitioning for md was discussed in this thread: > http://lkml.org/lkml/2003/11/13/182 > > In particular note Al Viro's response to Neil, in addition to Neil's own > > post. > > And I could have _sworn_ that Neil already posted a patch to do > partitions in md, but maybe my memory is playing tricks on me. > I thought that I had attached a patch to the end of my last mail, but I could have messed it up. The work to do partitioning in 2.6 looks to be incredibly less significant than in 2.4, thankfully =-) > > >>- generic device arrival notification mechanism: This is needed to >> support device hot-plug, and allow arrays to be automatically >> configured regardless of when the md module is loaded or > > initialized. > >> RedHat EL3 has a scaled down version of this already, but it is >> specific to MD and only works if MD is statically compiled into the >> kernel. A general mechanism will benefit MD as well as any other >> storage system that wants hot-arrival notices. > > > This would be via /sbin/hotplug, in the Linux world. SCSI already does > this, I think, so I suppose something similar would happen for md. > A problem that we've encountered, though, is the following sequence: 1) md is inialized during boot 2) drives X Y and Z are probed during boot 3) root fs exists on array [X Y Z], but md didn't see them show up, so it didn't auto-configure the array I'm not sure how this can be addressed by a userland daemon. Remember that we are focused on providing RAID during boot; configuring a secondary array after boot is a much easier problem. RHEL3 already has a mechanism to address this via the md_autodetect_dev() hook. This gets called by the partition code when partition entites are discovered. However, it is a static method, so it only works when md is compiled into the kernel. Our proposal to to turn this into a generic registration mechanism, where md can register as a listener. When it does that, it gets a list of previously announced devices, along with future devices as they are discovered. The code to do this is pretty small and simple. The biggest question is whether to implement it by enhancing add_partition(), or create a new call (i.e. device_register_partition() ), like is done in RHEL3. > > >>- RAID-0 fixes: The MD RAID-0 personality is unable to perform I/O >> that spans a chunk boundary. Modifications are needed so that it > > can > >> take a request and break it up into 1 or more per-disk requests. > > > I thought that raid0 was one of the few that actually did bio splitting > correctly? Hum, maybe this is a 2.4-only issue. Interesting, and > agreed, if so... > This is definitely still a problem in 2.6.1 > > >>- Metadata abstraction: We intend to support multiple on-disk > > metadata > >> formats, along with the 'native MD' format. To do this, specific >> knowledge of MD on-disk structures must be abstracted out of the > > core > >> and personalities modules. > > >>- DDF Metadata support: Future products will use the 'DDF' on-disk >> metadata scheme. These products will be bootable by the BIOS, but >> must have DDF support in the OS. This will plug into the > > abstraction > >> mentioned above. > > > Neil already did the work to make 'md' support multiple types of > superblocks, but I'm not sure if we want to hack 'md' to support the > various vendor RAIDs out there. DDF support we _definitely_ want, of > course. DDF follows a very nice philosophy: open[1] standard with no > vendor lock-in. > > IMO, your post/effort all boils down to an open design question: device > > mapper or md, for doing stuff like vendor-raid1 or vendor-raid5? And it > > is even possible to share (for example) raid5 engine among all the > various vendor RAID5's? > The stripe and parity format is not the problem here; md can be enhanced to support different stripe and parity rotation sequences without much trouble. Also, think beyond just DDF. Having plugable metadata personalities means that a module can be written for the existing Adaptec RAID products too (like the HostRAID functionality on our U320 adapters). It also means that you can write personality modules for other vendors, and even hardware RAID solutions. Imagine having a PCI RAID card fail, then plugging the drives directly into your computer and having the array 'Just Work'. As for the question of DM vs. MD, I think that you have to consider that DM right now has no concept of storing configuration data on the disk (at least that I can find, please correct me if I'm wrong). I think that DM will make a good LVM-like layer on top of MD, but I don't see it replacing MD right now. Scott ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Proposed enhancements to MD 2004-01-13 20:41 ` Scott Long @ 2004-01-13 22:33 ` Jure Pečar 2004-01-13 22:44 ` Scott Long 2004-01-13 22:56 ` viro 2004-01-14 15:52 ` Kevin Corry 1 sibling, 2 replies; 33+ messages in thread From: Jure Pečar @ 2004-01-13 22:33 UTC (permalink / raw) To: Scott Long; +Cc: jgarzik, linux-kernel, linux-raid, neilb On Tue, 13 Jan 2004 13:41:07 -0700 Scott Long <scott_long@adaptec.com> wrote: > A problem that we've encountered, though, is the following sequence: > > 1) md is inialized during boot > 2) drives X Y and Z are probed during boot > 3) root fs exists on array [X Y Z], but md didn't see them show up, > so it didn't auto-configure the array > > I'm not sure how this can be addressed by a userland daemon. Remember > that we are focused on providing RAID during boot; configuring a > secondary array after boot is a much easier problem. Looking at this chicken-and-egg problem of booting from an array from administrator's point of view ... What do you guys think about Intel's EFI? I think it would be the most apropriate place to put a piece of code that would scan the disks, assemble any arrays and present them to the OS as bootable devices ... If we're going to get a common metadata layout, that would be even easier. Thoughts? -- Jure Pečar - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Proposed enhancements to MD 2004-01-13 22:33 ` Jure Pečar @ 2004-01-13 22:44 ` Scott Long 2004-01-13 22:56 ` viro 1 sibling, 0 replies; 33+ messages in thread From: Scott Long @ 2004-01-13 22:44 UTC (permalink / raw) To: Jure Pečar; +Cc: jgarzik, linux-kernel, linux-raid, neilb Jure Pečar wrote: > On Tue, 13 Jan 2004 13:41:07 -0700 > Scott Long <scott_long@adaptec.com> wrote: > > >>A problem that we've encountered, though, is the following sequence: >> >>1) md is inialized during boot >>2) drives X Y and Z are probed during boot >>3) root fs exists on array [X Y Z], but md didn't see them show up, >> so it didn't auto-configure the array >> >>I'm not sure how this can be addressed by a userland daemon. Remember >>that we are focused on providing RAID during boot; configuring a >>secondary array after boot is a much easier problem. > > > Looking at this chicken-and-egg problem of booting from an array from > administrator's point of view ... > > What do you guys think about Intel's EFI? I think it would be the most > apropriate place to put a piece of code that would scan the disks, > assemble > any arrays and present them to the OS as bootable devices ... If we're > going > to get a common metadata layout, that would be even easier. > > Thoughts? > The BIOS already scans the disks, assembles the arrays, and presents finds the boot sector, and presents the arrays to the loader/GRUB. Are you saying that EFI should be the interface by which the arrays are communicated through, even after the kernel has booted? Is this possible right now? Scott ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Proposed enhancements to MD 2004-01-13 22:33 ` Jure Pečar 2004-01-13 22:44 ` Scott Long @ 2004-01-13 22:56 ` viro 1 sibling, 0 replies; 33+ messages in thread From: viro @ 2004-01-13 22:56 UTC (permalink / raw) To: Jure Pe??ar; +Cc: Scott Long, jgarzik, linux-kernel, linux-raid, neilb On Tue, Jan 13, 2004 at 11:33:20PM +0100, Jure Pe??ar wrote: > Looking at this chicken-and-egg problem of booting from an array from > administrator's point of view ... > > What do you guys think about Intel's EFI? I think it would be the most > apropriate place to put a piece of code that would scan the disks, assemble > any arrays and present them to the OS as bootable devices ... If we're going > to get a common metadata layout, that would be even easier. > > Thoughts? Why bother? We can have userland code running before any device drivers are initialized. And have access to * all normal system calls * normal writable filesystem already present (ramfs) * normal multitasking All of that - within the heavily tested codebase; regular kernel codepaths that are used all the time by everything. Oh, and it's portable. What's the benefit of doing that from EFI? Pure masochism? ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Proposed enhancements to MD 2004-01-13 20:41 ` Scott Long 2004-01-13 22:33 ` Jure Pečar @ 2004-01-14 15:52 ` Kevin Corry 1 sibling, 0 replies; 33+ messages in thread From: Kevin Corry @ 2004-01-14 15:52 UTC (permalink / raw) To: Scott Long, Jeff Garzik; +Cc: Linux Kernel, linux-raid, Neil Brown On Tuesday 13 January 2004 14:41, Scott Long wrote: > A problem that we've encountered, though, is the following sequence: > > 1) md is inialized during boot > 2) drives X Y and Z are probed during boot > 3) root fs exists on array [X Y Z], but md didn't see them show up, > so it didn't auto-configure the array > > I'm not sure how this can be addressed by a userland daemon. Remember > that we are focused on providing RAID during boot; configuring a > secondary array after boot is a much easier problem. This can already be accomplished with an init-ramdisk (or initramfs in the future). These provide the ability to run user-space code before the real root filesystem is mounted. > > I thought that raid0 was one of the few that actually did bio splitting > > correctly? Hum, maybe this is a 2.4-only issue. Interesting, and > > agreed, if so... > > This is definitely still a problem in 2.6.1 Device-Mapper does bio-splitting correctly, and already has a "stripe" module. It's pretty trivial to set up a raid0 device with DM. > As for the question of DM vs. MD, I think that you have to consider that > DM right now has no concept of storing configuration data on the disk > (at least that I can find, please correct me if I'm wrong). I think > that DM will make a good LVM-like layer on top of MD, but I don't see it > replacing MD right now. The DM core has no knowledge of any metadata, but that doesn't mean its sub-modules ("targets" in DM-speak) can't. Example, the dm-snapshot target has to record enough on-disk metadata for its snapshots to be persistent across reboots. Same with the persistent dm-mirror target that Joe Thornber and co. have been working on. You could certainly write a raid5 target that recorded parity and other state information on disk. The real key here is keeping the metadata that simply identifies the device separate from the metadata that keeps track of the device state. Using the snapshot example again, DM keeps a copy of the remapping table on disk, so an existing snapshot can be initialized when it's activated at boot-time. But this remapping table is completely separate from the metadata that identifies a device/volume as being a snapshot. In fact, EVMS and LVM2 have completely different ways of identifying snapshots (which is done in user-space), yet they both use the same kernel snapshot module. -- Kevin Corry kevcorry@us.ibm.com http://evms.sourceforge.net/ ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Proposed enhancements to MD 2004-01-13 18:44 ` Proposed enhancements to MD Jeff Garzik ` (2 preceding siblings ...) 2004-01-13 20:41 ` Scott Long @ 2004-01-13 22:42 ` Luca Berra 3 siblings, 0 replies; 33+ messages in thread From: Luca Berra @ 2004-01-13 22:42 UTC (permalink / raw) To: Jeff Garzik; +Cc: Scott Long, Linux Kernel, linux-raid, Neil Brown On Tue, Jan 13, 2004 at 01:44:05PM -0500, Jeff Garzik wrote: >And I could have _sworn_ that Neil already posted a patch to do >partitions in md, but maybe my memory is playing tricks on me. he did, and a long time ago also. http://cgi.cse.unsw.edu.au/~neilb/patches/ >IMO, your post/effort all boils down to an open design question: device >mapper or md, for doing stuff like vendor-raid1 or vendor-raid5? And it >is even possible to share (for example) raid5 engine among all the >various vendor RAID5's? I would believe the way to go is having md raid personalities turned into device mapper targets. the issue is that raid personalities need to be able to constantly update the metadata, so a callback must be in place to communicate `exceptions` to a layer that sits above device-mapper and handles metadatas. L. -- Luca Berra -- bluca@comedia.it Communication Media & Services S.r.l. /"\ \ / ASCII RIBBON CAMPAIGN X AGAINST HTML MAIL / \ ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Proposed enhancements to MD [not found] <40033D02.8000207@adaptec.com> 2004-01-13 18:44 ` Proposed enhancements to MD Jeff Garzik @ 2004-01-14 23:07 ` Neil Brown 2004-01-15 11:10 ` Norman Schmidt 2004-01-15 21:52 ` Matt Domsch 1 sibling, 2 replies; 33+ messages in thread From: Neil Brown @ 2004-01-14 23:07 UTC (permalink / raw) To: Scott Long; +Cc: linux-kernel, linux-raid On Monday January 12, scott_long@adaptec.com wrote: > All, > > Adaptec has been looking at the MD driver for a foundation for their > Open-Source software RAID stack. This will help us provide full > and open support for current and future Adaptec RAID products (as > opposed to the limited support through closed drivers that we have > now). Sounds like a great idea. > > While MD is fairly functional and clean, there are a number of > enhancements to it that we have been working on for a while and would > like to push out to the community for review and integration. These > include: It would help if you said up-front if you were thinking of 2.4 or 2.6 or 2.7 or all of whatever. I gather from subsequent emails in the thread that you are thinking of 2.6 and hoping for 2.4. It is definately too late for any of this to go into kernel.org 2.4, but some of it could live in an external patch set that people or vendors can choose or not. > > - partition support for md devices: MD does not support the concept of > fdisk partitions; the only way to approximate this right now is by > creating multiple arrays on the same media. Fixing this is required > for not only feature-completeness, but to allow our BIOS to recognise > the partitions on an array and properly boot them as it would boot a > normal disk. Your attached patch is completely unacceptable as it breaks backwards compatability. /dev/md1 (blockdev 9,1) changes from being the second md array to being the first partition of the first md array. I too would like to support partitions of md devices but there is not really elegant way to do it. I'm beginning to think the best approach is to use a new major number (which will be dynammically allocated because Linus has forbidden new static allocations). This should be fairly easy to do. A reasonable alternate is to use DM. As I understand it, DM can work with any sort of metadata (As metadata is handled by user-space) so this should work just fine. Note that kernel-based autodetection is seriously a thing of the past. As has been said already, it should be just as easy and much more manageable to do autodtection in early user-space. If it isn't, then we need to improve the early user-space tools. > > - generic device arrival notification mechanism: This is needed to > support device hot-plug, and allow arrays to be automatically > configured regardless of when the md module is loaded or initialized. > RedHat EL3 has a scaled down version of this already, but it is > specific to MD and only works if MD is statically compiled into the > kernel. A general mechanism will benefit MD as well as any other > storage system that wants hot-arrival notices. This has largely been covered, but just to add or clarify slightly: This is not an md issue. This is either a buss controller or userspace issue. 2.6 has a "hotplug" infrastructure and each buss should report hotplug events to userspace. If they don't they should be enhanced so they do. If they do, then userspace needs to be told what to do with these events, and when to assemble devices into arrays. > > - RAID-0 fixes: The MD RAID-0 personality is unable to perform I/O > that spans a chunk boundary. Modifications are needed so that it can > take a request and break it up into 1 or more per-disk requests. In 2.4 it cannot, but arguable doesn't need to. However I have a fairly straight-forward patch which supports raid0 request splitting. In 2.6, this should work properly already. > > - Metadata abstraction: We intend to support multiple on-disk metadata > formats, along with the 'native MD' format. To do this, specific > knowledge of MD on-disk structures must be abstracted out of the core > and personalities modules. In 2.4, this would be a massive amount of work and I don't recommend it. In 2.6, most of this is already done - the knowledge about superblock format is very localised. I would like to extend this so that a loadable module can add a new format. Patches welcome. Note that the kernel does need to know about the format of the superblock. DM can manage without knowing as it's superblock is read mostly and the very few updates (for reconfiguration) are managed by userspace. For raid1 and raid5 (which DM doesn't support), we need to update the superblock on errors and I think that is best done in the kernel. > > - DDF Metadata support: Future products will use the 'DDF' on-disk > metadata scheme. These products will be bootable by the BIOS, but > must have DDF support in the OS. This will plug into the abstraction > mentioned above. I'm looking forward to seeing the specs for DDF (but isn't it pretty dump to develop a standard in a closed forum). If DDF turns out to have real value I would he happy to have support for it in linux/md. NeilBrown ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Proposed enhancements to MD 2004-01-14 23:07 ` Neil Brown @ 2004-01-15 11:10 ` Norman Schmidt 2004-01-15 21:52 ` Matt Domsch 1 sibling, 0 replies; 33+ messages in thread From: Norman Schmidt @ 2004-01-15 11:10 UTC (permalink / raw) To: linux-raid Neil Brown wrote: > I'm beginning to think the best approach is to use a new major number > (which will be dynammically allocated because Linus has forbidden new > static allocations). This should be fairly easy to do. It seems that has changed: http://www.lanana.org/docs/device-list/ Norman. -- -- Norman Schmidt Institut fuer Physikal. u. Theoret. Chemie Dipl.-Chem. Friedrich-Alexander-Universitaet schmidt@naa.net Erlangen-Nuernberg IT-Systembetreuer Physikalische Chemie ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Proposed enhancements to MD 2004-01-14 23:07 ` Neil Brown 2004-01-15 11:10 ` Norman Schmidt @ 2004-01-15 21:52 ` Matt Domsch 2004-01-16 9:24 ` Lars Marowsky-Bree 1 sibling, 1 reply; 33+ messages in thread From: Matt Domsch @ 2004-01-15 21:52 UTC (permalink / raw) To: Neil Brown; +Cc: Scott Long, linux-kernel, linux-raid On Thu, Jan 15, 2004 at 10:07:34AM +1100, Neil Brown wrote: > On Monday January 12, scott_long@adaptec.com wrote: > > All, > > > > Adaptec has been looking at the MD driver for a foundation for their > > Open-Source software RAID stack. This will help us provide full > > and open support for current and future Adaptec RAID products (as > > opposed to the limited support through closed drivers that we have > > now). > > Sounds like a great idea. > > > - Metadata abstraction: We intend to support multiple on-disk metadata > > formats, along with the 'native MD' format. To do this, specific > > knowledge of MD on-disk structures must be abstracted out of the core > > and personalities modules. > > In 2.4, this would be a massive amount of work and I don't recommend > it. Scott has a decent stab at doing so already in 2.4, and I've encouraged him to post the code he's got now. Since it's too intrusive for 2.4, perhaps it could be added in parallel, an "emd" driver, and one could choose to use emd to get the DDF functionality, or continue to use md without DDF. Here are some of the features I know I'm looking for, and I've compared solutions suggested. Comments/corrections welcome. * Solution works in both 2.4 and 2.6 kernels - less ideal of two different solutions are needed * RAID 0,1 DDF format * Bootable from degraded R1 * Online Rebuild * Mgmt tools/hooks - online create, delete, modify * Event notification/logging * Error Handling * Installation - simple i.e. without modifying distro installers significantly or at all; driver disk only is ideal From what I see about DM at present: * RAID 0,1 possible, dm-raid1 module in Sistina CVS needs to get merged * Boot drive - requires setup method early in boot process, either initrd or kernel code * Boot from degraded RAID1 requires setup method early in boot process, either initrd or kernel code. * Online Rebuild - dm-raid1 has this capability * mgmt tools/hooks - DM has today way to communicate to kernel the changes desired. What remains is userspace tools that read, modify DDF metadata and calls into these hooks. * Event notification / logging - doesn't appear to exist in DM * Error handling - unclear if/how DM handles this. For instance, how is a disk failure on a dm-raid1 array handled? * Installation - RHEL3 doesn't include DM yet, significant installer work necessary for several distros. From what I see about md: * RAID 0,1 there today, no DDF * Boot drive - yes * Boot from degraded RAID1 - possible but may require manual intervention depending on BIOS capabilities * Online Rebuild - there today * mgmt tools/hooks - mdadm there today * Event notification / logging - mdadm there today * Error handling - there today * Installation - disto installer capable of this today From what I see about emd: * RAID 0,1 - code being developed by Adaptec today, DDF capable * Boot drive - yes * Boot from degraded RAID1 - possible without intervention due to Adaptec BIOS * Online Rebuild - there today * mgmt tools/hooks - mdadm there today, expect Adaptec to enhance mdam to support DDF * Event notification / logging - mdadm there today * Error handling - there today * Installation - could be done with only a driver disk which adds the emd module. Am I way off base here? :-) Thanks, Matt -- Matt Domsch Sr. Software Engineer, Lead Engineer Dell Linux Solutions www.dell.com/linux Linux on Dell mailing lists @ http://lists.us.dell.com ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Proposed enhancements to MD 2004-01-15 21:52 ` Matt Domsch @ 2004-01-16 9:24 ` Lars Marowsky-Bree 2004-01-16 13:43 ` Matt Domsch 0 siblings, 1 reply; 33+ messages in thread From: Lars Marowsky-Bree @ 2004-01-16 9:24 UTC (permalink / raw) To: Matt Domsch, Neil Brown; +Cc: Scott Long, linux-kernel, linux-raid On 2004-01-15T15:52:21, Matt Domsch <Matt_Domsch@dell.com> said: > * Solution works in both 2.4 and 2.6 kernels > - less ideal of two different solutions are needed Sure, this is important. > * RAID 0,1 DDF format > * Bootable from degraded R1 We were looking at extending the boot loader (grub/lilo) to have additional support for R1 & multipath. (ie, booting from the first drive/path in the set where a consistent image can be read.) If the BIOS supports DDF too, this would get even better. For the boot drive, this is highly desireable! Do you know whether DDF can also support simple multipathing? > * Boot from degraded RAID1 requires setup method early in boot > process, either initrd or kernel code. This is needed with DDF too; we need to parse the DDF data somewhere afterall. > From what I see about md: > * RAID 0,1 there today, no DDF Supporting additional metadata is desireable. For 2.6, this is already in the code, and I am looking forward to having this feature. > Am I way off base here? :-) I don't think so. But for 2.6, the functionality should go either into DM or MD, not into emd. I don't care which, really, both sides have good arguments, none of which _really_ matter from a user-perspective ;-) (If, in 2.7 time, we rip out MD and fully integrate it all into DM, then we can see further.) Sincerely, Lars Marowsky-Brée <lmb@suse.de> -- High Availability & Clustering \ ever tried. ever failed. no matter. SUSE Labs | try again. fail again. fail better. Research & Development, SUSE LINUX AG \ -- Samuel Beckett - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Proposed enhancements to MD 2004-01-16 9:24 ` Lars Marowsky-Bree @ 2004-01-16 13:43 ` Matt Domsch 2004-01-16 13:56 ` Lars Marowsky-Bree 0 siblings, 1 reply; 33+ messages in thread From: Matt Domsch @ 2004-01-16 13:43 UTC (permalink / raw) To: Lars Marowsky-Bree; +Cc: Neil Brown, Scott Long, linux-kernel, linux-raid On Fri, Jan 16, 2004 at 10:24:47AM +0100, Lars Marowsky-Bree wrote: > Do you know whether DDF can also support simple multipathing? Yes, the structure info for each physical disk allows for two (and only 2) paths to be represented. But it's pretty limited, describing only SCSI-like paths with bus/id/lun only described in the current draft. At the same time, there's a per-physical-disk GUID, such that if you find the same disk by multiple paths you can tell. There's room for enhancment/feedback in this space for certain. Thanks, Matt -- Matt Domsch Sr. Software Engineer, Lead Engineer Dell Linux Solutions www.dell.com/linux Linux on Dell mailing lists @ http://lists.us.dell.com ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Proposed enhancements to MD 2004-01-16 13:43 ` Matt Domsch @ 2004-01-16 13:56 ` Lars Marowsky-Bree 2004-01-16 14:06 ` Christoph Hellwig 0 siblings, 1 reply; 33+ messages in thread From: Lars Marowsky-Bree @ 2004-01-16 13:56 UTC (permalink / raw) To: Matt Domsch; +Cc: Neil Brown, Scott Long, linux-kernel, linux-raid On 2004-01-16T07:43:36, Matt Domsch <Matt_Domsch@dell.com> said: > > Do you know whether DDF can also support simple multipathing? > Yes, the structure info for each physical disk allows for two (and > only 2) paths to be represented. But it's pretty limited, describing > only SCSI-like paths with bus/id/lun only described in the current > draft. At the same time, there's a per-physical-disk GUID, such > that if you find the same disk by multiple paths you can tell. > There's room for enhancment/feedback in this space for certain. One would guess that for m-p, a mere media UUID would be completely enough; one can simply scan where those are found. If it encodes the bus/id/lun, I can forsee bad effects if the device enumeration changes because the HBAs get swapped in their slots ;-) Sincerely, Lars Marowsky-Brée <lmb@suse.de> -- High Availability & Clustering \ ever tried. ever failed. no matter. SUSE Labs | try again. fail again. fail better. Research & Development, SUSE LINUX AG \ -- Samuel Beckett ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Proposed enhancements to MD 2004-01-16 13:56 ` Lars Marowsky-Bree @ 2004-01-16 14:06 ` Christoph Hellwig 2004-01-16 14:11 ` Matt Domsch 0 siblings, 1 reply; 33+ messages in thread From: Christoph Hellwig @ 2004-01-16 14:06 UTC (permalink / raw) To: Lars Marowsky-Bree Cc: Matt Domsch, Neil Brown, Scott Long, linux-kernel, linux-raid On Fri, Jan 16, 2004 at 02:56:46PM +0100, Lars Marowsky-Bree wrote: > If it encodes the bus/id/lun, I can forsee bad effects if the device > enumeration changes because the HBAs get swapped in their slots ;-) A bus/id/lun enumeration is completely bogus. Think (S)ATA, FC or iSCSI. So is there a pointer to the current version of the spec? Just reading these multi-path enumerations start to give me the feeling this spec is designed rather badly.. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Proposed enhancements to MD 2004-01-16 14:06 ` Christoph Hellwig @ 2004-01-16 14:11 ` Matt Domsch 2004-01-16 14:13 ` Christoph Hellwig 0 siblings, 1 reply; 33+ messages in thread From: Matt Domsch @ 2004-01-16 14:11 UTC (permalink / raw) To: Christoph Hellwig Cc: Lars Marowsky-Bree, Neil Brown, Scott Long, linux-kernel, linux-raid On Fri, 16 Jan 2004, Christoph Hellwig wrote: > On Fri, Jan 16, 2004 at 02:56:46PM +0100, Lars Marowsky-Bree wrote: > > If it encodes the bus/id/lun, I can forsee bad effects if the device > > enumeration changes because the HBAs get swapped in their slots ;-) I believe it's just supposed to be a hint to the firmware that the drive has roamed from one physical slot to another. > A bus/id/lun enumeration is completely bogus. Think (S)ATA, FC or > iSCSI. > > So is there a pointer to the current version of the spec? Just reading > these multi-path enumerations start to give me the feeling this spec > is designed rather badly.. www.snia.org in the DDF TWG section, but requires you be a member of SNIA to see at present. The DDF chairperson is trying to make the draft publicly available, and if/when I see that happen I'll post a link to it here. Thanks, Matt -- Matt Domsch Sr. Software Engineer, Lead Engineer Dell Linux Solutions www.dell.com/linux Linux on Dell mailing lists @ http://lists.us.dell.com ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Proposed enhancements to MD 2004-01-16 14:11 ` Matt Domsch @ 2004-01-16 14:13 ` Christoph Hellwig 0 siblings, 0 replies; 33+ messages in thread From: Christoph Hellwig @ 2004-01-16 14:13 UTC (permalink / raw) To: Matt Domsch Cc: Lars Marowsky-Bree, Neil Brown, Scott Long, linux-kernel, linux-raid On Fri, Jan 16, 2004 at 08:11:07AM -0600, Matt Domsch wrote: > www.snia.org in the DDF TWG section, but requires you be a member of SNIA > to see at present. The DDF chairperson is trying to make the draft > publicly available, and if/when I see that happen I'll post a link to it > here. Oops. That's not a good sign. /me tries to remember a sane spec coming from SNIA and fails.. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Proposed Enhancements to MD @ 2004-01-13 3:41 Scott Long 2004-01-13 10:24 ` Lars Marowsky-Bree 2004-01-13 14:19 ` Matt Domsch 0 siblings, 2 replies; 33+ messages in thread From: Scott Long @ 2004-01-13 3:41 UTC (permalink / raw) To: linux-raid (I already posted this to LKML a few hours ago but forgot to post it over here) All, Adaptec has been looking at the MD driver for a foundation for their Open-Source software RAID stack. This will help us provide full and open support for current and future Adaptec RAID products (as opposed to the limited support through closed drivers that we have now). While MD is fairly functional and clean, there are a number of enhancements to it that we have been working on for a while and would like to push out to the community for review and integration. These include: - partition support for md devices: MD does not support the concept of fdisk partitions; the only way to approximate this right now is by creating multiple arrays on the same media. Fixing this is required for not only feature-completeness, but to allow our BIOS to recognise the partitions on an array and properly boot them as it would boot a normal disk. - generic device arrival notification mechanism: This is needed to support device hot-plug, and allow arrays to be automatically configured regardless of when the md module is loaded or initialized. RedHat EL3 has a scaled down version of this already, but it is specific to MD and only works if MD is statically compiled into the kernel. A general mechanism will benefit MD as well as any other storage system that wants hot-arrival notices. - RAID-0 fixes: The MD RAID-0 personality is unable to perform I/O that spans a chunk boundary. Modifications are needed so that it can take a request and break it up into 1 or more per-disk requests. - Metadata abstraction: We intend to support multiple on-disk metadata formats, along with the 'native MD' format. To do this, specific knowledge of MD on-disk structures must be abstracted out of the core and personalities modules. - DDF Metadata support: Future products will use the 'DDF' on-disk metadata scheme. These products will be bootable by the BIOS, but must have DDF support in the OS. This will plug into the abstraction mentioned above. I'm going to push these changes out in phases in order to keep the risk and churn to a minimum. The attached patch is for the partition support. It was originally from Ingo Molnar, but has changed quite a bit due to the radical changes in the disk/block layer in 2.6. The 2.4 version works quite well, while the 2.6 version is fairly fresh. One problem that I have with it is that the created partitions show up in /proc/partitions after running fdisk, but not after a reboot. Scott ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Proposed Enhancements to MD 2004-01-13 3:41 Proposed Enhancements " Scott Long @ 2004-01-13 10:24 ` Lars Marowsky-Bree 2004-01-13 18:03 ` Scott Long 2004-01-13 14:19 ` Matt Domsch 1 sibling, 1 reply; 33+ messages in thread From: Lars Marowsky-Bree @ 2004-01-13 10:24 UTC (permalink / raw) To: Scott Long, linux-raid On 2004-01-12T20:41:54, Scott Long <scott_long@adaptec.com> said: Hi Scott, this is good to see! > - partition support for md devices: MD does not support the concept of > fdisk partitions; the only way to approximate this right now is by > creating multiple arrays on the same media. Fixing this is required > for not only feature-completeness, but to allow our BIOS to recognise > the partitions on an array and properly boot them as it would boot a > normal disk. I'm not too excited about this, because Device Mapping on top of md is much more flexible, but I see that users want it, and it should be pretty easy to add. > - generic device arrival notification mechanism: This is needed to > support device hot-plug, and allow arrays to be automatically > configured regardless of when the md module is loaded or initialized. > RedHat EL3 has a scaled down version of this already, but it is > specific to MD and only works if MD is statically compiled into the > kernel. A general mechanism will benefit MD as well as any other > storage system that wants hot-arrival notices. Yes. Is anything missing from the 2.6 & hotplug & udev solution which you require? > - RAID-0 fixes: The MD RAID-0 personality is unable to perform I/O > that spans a chunk boundary. Modifications are needed so that it can > take a request and break it up into 1 or more per-disk requests. Agreed. > - Metadata abstraction: We intend to support multiple on-disk metadata > formats, along with the 'native MD' format. To do this, specific > knowledge of MD on-disk structures must be abstracted out of the core > and personalities modules. This can get difficult, of course, and needs to be implemented in a way which doesn't slow us down too much. > - DDF Metadata support: Future products will use the 'DDF' on-disk > metadata scheme. These products will be bootable by the BIOS, but > must have DDF support in the OS. This will plug into the abstraction > mentioned above. OK. How does the DDF metadata differ from the current md data? Is it merely the layout, or are there functional differences? In particular, I'm wondering whether partitions using the new activity logging features of md will still be bootable, or whether the boot partitions need to be 'md classic'. > bit due to the radical changes in the disk/block layer in 2.6. The 2.4 > version works quite well, while the 2.6 version is fairly fresh. I'd be reluctant doing any of the work for 2.4, but this is of course upto you. Sincerely, Lars Marowsky-Brée <lmb@suse.de> -- High Availability & Clustering \ ever tried. ever failed. no matter. SUSE Labs | try again. fail again. fail better. Research & Development, SUSE LINUX AG \ -- Samuel Beckett - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Proposed Enhancements to MD 2004-01-13 10:24 ` Lars Marowsky-Bree @ 2004-01-13 18:03 ` Scott Long 2004-01-16 9:29 ` Lars Marowsky-Bree 0 siblings, 1 reply; 33+ messages in thread From: Scott Long @ 2004-01-13 18:03 UTC (permalink / raw) To: Lars Marowsky-Bree; +Cc: linux-raid Lars Marowsky-Bree wrote: > On 2004-01-12T20:41:54, > Scott Long <scott_long@adaptec.com> said: > > Hi Scott, this is good to see! > > >>- partition support for md devices: MD does not support the concept of >> fdisk partitions; the only way to approximate this right now is by >> creating multiple arrays on the same media. Fixing this is required >> for not only feature-completeness, but to allow our BIOS to recognise >> the partitions on an array and properly boot them as it would boot a >> normal disk. > > > I'm not too excited about this, because Device Mapping on top of md is > much more flexible, but I see that users want it, and it should be > pretty easy to add. > The biggest issue here is that a real fdisk table needs to exist on the array in order for our BIOS to recognise it as a boot device. While Device Mapper can probably do a good job at creating logical storage extends out of a single md device, it doesn't get us any closer to being able to boot off of an MD array. > >>- generic device arrival notification mechanism: This is needed to >> support device hot-plug, and allow arrays to be automatically >> configured regardless of when the md module is loaded or initialized. >> RedHat EL3 has a scaled down version of this already, but it is >> specific to MD and only works if MD is statically compiled into the >> kernel. A general mechanism will benefit MD as well as any other >> storage system that wants hot-arrival notices. > > > Yes. Is anything missing from the 2.6 & hotplug & udev solution which > you require? > I'll admit that I'm not as familiar with 2.6 as I should be. Does a disk arrival mechanism already exist? > >>- RAID-0 fixes: The MD RAID-0 personality is unable to perform I/O >> that spans a chunk boundary. Modifications are needed so that it can >> take a request and break it up into 1 or more per-disk requests. > > > Agreed. > > >>- Metadata abstraction: We intend to support multiple on-disk metadata >> formats, along with the 'native MD' format. To do this, specific >> knowledge of MD on-disk structures must be abstracted out of the core >> and personalities modules. > > > This can get difficult, of course, and needs to be implemented in a way > which doesn't slow us down too much. > Normal I/O doesn't touch the metadata. Only during error recovery and configuration would this be touched. Instead of the core and personality modules directly manipulating the metadata, a set of metadata-specific function pointers will be called through to handle changing the on-disk metadata. So, no significant operational overhead is introduced. > >>- DDF Metadata support: Future products will use the 'DDF' on-disk >> metadata scheme. These products will be bootable by the BIOS, but >> must have DDF support in the OS. This will plug into the abstraction >> mentioned above. > > > OK. How does the DDF metadata differ from the current md data? Is it > merely the layout, or are there functional differences? > I'm not sure if the DDF spec has been officially published yet. It defines a set of data structures and their location on the disk that allows disk to be uniquely identified, logical extents to be grouped into arrays, recording of disk and array state, and event logging. It is completely different from the metadata that is used for classic MD. However, it is still compatible with the high-level striping and mirroring operations of MD. > In particular, I'm wondering whether partitions using the new activity > logging features of md will still be bootable, or whether the boot > partitions need to be 'md classic'. Our products will only recognise and boot off of DDF arrays. They have no concept of classic MD metadata. The goal of the abstraction is to allow new metadata personalities to be plugged in and 'Just Work', while not inhibiting the choice of using whatever metadata is most suitable for existing arrays. If you need to boot off of a DDF-aware controller, but use classic MD for secondary arrays, that will work. > > >>bit due to the radical changes in the disk/block layer in 2.6. The 2.4 >>version works quite well, while the 2.6 version is fairly fresh. > > > I'd be reluctant doing any of the work for 2.4, but this is of course > upto you. This work was originally started on 2.4. With the closing of 2.4 and release of 2.6, we are porting are work forward. It would be nice to integrate the changes into 2.4 also, but we recognise the need for 2.4 to remain as stable as possible. Scott ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Proposed Enhancements to MD 2004-01-13 18:03 ` Scott Long @ 2004-01-16 9:29 ` Lars Marowsky-Bree 0 siblings, 0 replies; 33+ messages in thread From: Lars Marowsky-Bree @ 2004-01-16 9:29 UTC (permalink / raw) To: Scott Long; +Cc: linux-raid On 2004-01-13T11:03:40, Scott Long <scott_long@adaptec.com> said: > The biggest issue here is that a real fdisk table needs to exist on the > array in order for our BIOS to recognise it as a boot device. Hm, ok. > >Yes. Is anything missing from the 2.6 & hotplug & udev solution which > >you require? > > I'll admit that I'm not as familiar with 2.6 as I should be. Does a > disk arrival mechanism already exist? Yes. hotplug already will get you events when new disks arrive. > >In particular, I'm wondering whether partitions using the new activity > >logging features of md will still be bootable, or whether the boot > >partitions need to be 'md classic'. > > Our products will only recognise and boot off of DDF arrays. They have > no concept of classic MD metadata. OK. The question was meant differently. In 2.6, we have the ability to log resyncs and journal updates (see the discussions on linux-raid). I was just wondering whether DDF would allow this, or whether it is a simple minded "this disk good, that disk bad", and thus the boot drive might not be able to use the new md features with the DDF metadata. > This work was originally started on 2.4. With the closing of 2.4 and > release of 2.6, we are porting are work forward. It would be nice to > integrate the changes into 2.4 also, but we recognise the need for 2.4 > to remain as stable as possible. 2.4 is dead and shouldn't see new features. Sincerely, Lars Marowsky-Brée <lmb@suse.de> -- High Availability & Clustering \ ever tried. ever failed. no matter. SUSE Labs | try again. fail again. fail better. Research & Development, SUSE LINUX AG \ -- Samuel Beckett - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Proposed Enhancements to MD 2004-01-13 3:41 Proposed Enhancements " Scott Long 2004-01-13 10:24 ` Lars Marowsky-Bree @ 2004-01-13 14:19 ` Matt Domsch 2004-01-13 17:13 ` Andreas Dilger ` (2 more replies) 1 sibling, 3 replies; 33+ messages in thread From: Matt Domsch @ 2004-01-13 14:19 UTC (permalink / raw) To: Scott Long; +Cc: linux-raid, linux-kernel On Mon, Jan 12, 2004 at 08:41:54PM -0700, Scott Long wrote: > - DDF Metadata support: Future products will use the 'DDF' on-disk > metadata scheme. These products will be bootable by the BIOS, but > must have DDF support in the OS. This will plug into the abstraction > mentioned above. For those unfamiliar with DDF (Disk Data Format), it is a Storage Networking Industry Association (SNIA) project ("Common RAID DDF TWG"), designed to provide a single metadata format to be used by all the RAID vendors (hardware and software alike). It removes vendor lock-in by having a metadata format that all can use, thus in theory you could move disks from an Adaptec hardware RAID controller to an LSI software RAID solution without reformatting the disks or touching your file systems in any way. Dell has been championing the DDF concept for quite a while, and is driving vendors from which we purchase RAID solutions to use DDF instead of their own individual metadata formats. I haven't seen the spec yet myself, but I'm lead to believe that DDF allows for multiple logical drives to be created across a single set of disks (e.g. a 10GB RAID1 LD and a 140GB RAID0 LD together on two 80GB spindles), as well as whole disks be used. It has a mechanism to support reconstruction checkpointing, so you don't have to restart a reconstruct from the beginning after a reboot, but from where you left off. And other useful features too that you'd expect in a common RAID solution. DDF is quickly becoming important to RAID and system vendors, and I welcome Adaptec's work to implement DDF support on Linux. Thanks, Matt -- Matt Domsch Sr. Software Engineer, Lead Engineer Dell Linux Solutions www.dell.com/linux Linux on Dell mailing lists @ http://lists.us.dell.com ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Proposed Enhancements to MD 2004-01-13 14:19 ` Matt Domsch @ 2004-01-13 17:13 ` Andreas Dilger 2004-01-13 22:26 ` Andreas Dilger 2004-01-13 18:19 ` Kevin P. Fleming 2004-01-13 18:19 ` Jeff Garzik 2 siblings, 1 reply; 33+ messages in thread From: Andreas Dilger @ 2004-01-13 17:13 UTC (permalink / raw) To: Matt Domsch; +Cc: Scott Long, linux-raid, linux-kernel On Jan 13, 2004 08:19 -0600, Matt Domsch wrote: > On Mon, Jan 12, 2004 at 08:41:54PM -0700, Scott Long wrote: > > - DDF Metadata support: Future products will use the 'DDF' on-disk > > metadata scheme. These products will be bootable by the BIOS, but > > must have DDF support in the OS. This will plug into the abstraction > > mentioned above. > > For those unfamiliar with DDF (Disk Data Format), it is a Storage > Networking Industry Association (SNIA) project ("Common RAID DDF > TWG"), designed to provide a single metadata format to be used by all > the RAID vendors (hardware and software alike). It removes vendor > lock-in by having a metadata format that all can use, thus in theory > you could move disks from an Adaptec hardware RAID controller to an > LSI software RAID solution without reformatting the disks or touching > your file systems in any way. Dell has been championing the DDF > concept for quite a while, and is driving vendors from which we > purchase RAID solutions to use DDF instead of their own individual > metadata formats. > > I haven't seen the spec yet myself, but I'm lead to believe that > DDF allows for multiple logical drives to be created across a single > set of disks (e.g. a 10GB RAID1 LD and a 140GB RAID0 LD together on > two 80GB spindles), as well as whole disks be used. It has a > mechanism to support reconstruction checkpointing, so you don't have > to restart a reconstruct from the beginning after a reboot, but from > where you left off. And other useful features too that you'd expect > in a common RAID solution. So, why not use EVMS and/or Device Mapper to read the DDF metadata and set up the mappings that way? Cheers, Andreas -- Andreas Dilger http://sourceforge.net/projects/ext2resize/ http://www-mddsp.enel.ucalgary.ca/People/adilger/ ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Proposed Enhancements to MD 2004-01-13 17:13 ` Andreas Dilger @ 2004-01-13 22:26 ` Andreas Dilger 0 siblings, 0 replies; 33+ messages in thread From: Andreas Dilger @ 2004-01-13 22:26 UTC (permalink / raw) To: Matt Domsch, Scott Long, linux-raid, linux-kernel On Jan 13, 2004 10:13 -0700, Andreas Dilger wrote: > So, why not use EVMS and/or Device Mapper to read the DDF metadata and > set up the mappings that way? PS - outgoing email was delayed, this has already been covered. Sorry. Cheers, Andreas -- Andreas Dilger http://sourceforge.net/projects/ext2resize/ http://www-mddsp.enel.ucalgary.ca/People/adilger/ ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Proposed Enhancements to MD 2004-01-13 14:19 ` Matt Domsch 2004-01-13 17:13 ` Andreas Dilger @ 2004-01-13 18:19 ` Kevin P. Fleming 2004-01-13 18:19 ` Jeff Garzik 2 siblings, 0 replies; 33+ messages in thread From: Kevin P. Fleming @ 2004-01-13 18:19 UTC (permalink / raw) To: Matt Domsch; +Cc: Scott Long, linux-raid, linux-kernel Matt Domsch wrote: > DDF is quickly becoming important to RAID and system vendors, and I > welcome Adaptec's work to implement DDF support on Linux. Fully agreed, the days of vendor-specific metadata formats need to be numbered (with a small number). Speaking a customer with a CMD FC-to-SCSI RAID controller, which used to be dual-redundant but is now single (because of a dead unit), we are not looking forward to the day when the remaining controller dies and we lose all the data on the array due to a forced metadata format change. However, given that this will not likely be 2.6 material until after it's built and tested in 2.7 and then backported, it doesn't seem to make any sense to me to build any of this on top of the MD subsystem at all (see other replies about using DM instead). Additionally, it also does not seem to make any sense to build any of the DDF reading/writing/management in the kernel _at all_. There is no advantage to it being there once initramfs is a standard part of the boot process, so all of this should be done is userspace and just communicated into the kernel to tell it what logical devices to construct using which DM modules. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Proposed Enhancements to MD 2004-01-13 14:19 ` Matt Domsch 2004-01-13 17:13 ` Andreas Dilger 2004-01-13 18:19 ` Kevin P. Fleming @ 2004-01-13 18:19 ` Jeff Garzik 2004-01-13 20:29 ` Chris Friesen 2004-01-13 21:10 ` Matt Domsch 2 siblings, 2 replies; 33+ messages in thread From: Jeff Garzik @ 2004-01-13 18:19 UTC (permalink / raw) To: Matt Domsch; +Cc: Scott Long, linux-raid, linux-kernel Matt Domsch wrote: > I haven't seen the spec yet myself, but I'm lead to believe that > DDF allows for multiple logical drives to be created across a single > set of disks (e.g. a 10GB RAID1 LD and a 140GB RAID0 LD together on > two 80GB spindles), as well as whole disks be used. It has a Me either. Any idea if there will be a public comment period, or is the spec "locked" into 1.0 when it's released in a month or so? Jeff ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Proposed Enhancements to MD 2004-01-13 18:19 ` Jeff Garzik @ 2004-01-13 20:29 ` Chris Friesen 2004-01-13 20:35 ` Matt Domsch 2004-01-13 21:10 ` Matt Domsch 1 sibling, 1 reply; 33+ messages in thread From: Chris Friesen @ 2004-01-13 20:29 UTC (permalink / raw) To: Jeff Garzik; +Cc: Matt Domsch, Scott Long, linux-raid, linux-kernel Jeff Garzik wrote: > Matt Domsch wrote: > >> I haven't seen the spec yet myself, but I'm lead to believe that >> DDF allows for multiple logical drives to be created across a single >> set of disks (e.g. a 10GB RAID1 LD and a 140GB RAID0 LD together on >> two 80GB spindles), as well as whole disks be used. It has a How is this different than the 20GB RAID0 and 6 15BB RAID1s that I've got on two 100GB spindles right now? I think its on 2.4, might even be 2.2. Chris -- Chris Friesen | MailStop: 043/33/F10 Nortel Networks | work: (613) 765-0557 3500 Carling Avenue | fax: (613) 765-2986 Nepean, ON K2H 8E9 Canada | email: cfriesen@nortelnetworks.com ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Proposed Enhancements to MD 2004-01-13 20:29 ` Chris Friesen @ 2004-01-13 20:35 ` Matt Domsch 0 siblings, 0 replies; 33+ messages in thread From: Matt Domsch @ 2004-01-13 20:35 UTC (permalink / raw) To: Chris Friesen; +Cc: Jeff Garzik, Scott Long, linux-raid, linux-kernel On Tue, Jan 13, 2004 at 03:29:45PM -0500, Chris Friesen wrote: > How is this different than the 20GB RAID0 and 6 15BB RAID1s that I've > got on two 100GB spindles right now? Indeed, md does this with partitions on the disks today, so it is analogous; DDF does this with disk extents, which has the same functionality as partitions, but without an MSDOS partition table to define the partitions, but an on-disk metadata format (yes, partition tables are metadata too...). The solution needs partitions/extents in two places. 1) below the logical drive, from which logical drives are created. 2) above the logical drive, on which multiple file systems are created. md provides 1) today, and as discussed today patches exist to do 2) that have not yet been merged. -- Matt Domsch Sr. Software Engineer, Lead Engineer Dell Linux Solutions www.dell.com/linux Linux on Dell mailing lists @ http://lists.us.dell.com ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Proposed Enhancements to MD 2004-01-13 18:19 ` Jeff Garzik 2004-01-13 20:29 ` Chris Friesen @ 2004-01-13 21:10 ` Matt Domsch 1 sibling, 0 replies; 33+ messages in thread From: Matt Domsch @ 2004-01-13 21:10 UTC (permalink / raw) To: Jeff Garzik; +Cc: Scott Long, linux-raid, linux-kernel On Tue, Jan 13, 2004 at 01:19:56PM -0500, Jeff Garzik wrote: > Matt Domsch wrote: > > I haven't seen the spec yet myself, but I'm lead to believe that > > DDF allows for multiple logical drives to be created across a single > > set of disks (e.g. a 10GB RAID1 LD and a 140GB RAID0 LD together on > > two 80GB spindles), as well as whole disks be used. It has a > > > Me either. Any idea if there will be a public comment period, or is the > spec "locked" into 1.0 when it's released in a month or so? As it happens, Bill Dawkins of Dell is the DDF committee chair at SNIA. Here's what he's told me: The current draft of the DDF specification is available for review to any member of SNIA. This is a "Work in Progress" draft. Anyone in a member company can go to www.snia.org and sign up for web access. They will then have to sign up for the DDF Technical Working Group. Acceptance to the DDF TWG is automatic and the current documents are available there. (As Dell is a member, I signed up for the DDF TWG as an observer. Other companies are also on the member list, including Sistina, so Jeff you may be able to get a Sistina collegue to mail you a copy. http://www.snia.org/about/member_list has the list of member companies. - Matt) For people and companies who are not members of SNIA, I am writing to the SNIA Technical Director to see if I can release copies of the draft spec now. I'll let you know when I get a response. As for the timeline, we have a face to face meeting of the DDF TWG next Tuesday and it is our intent to vote on releasing the specification as a "Trial Use" specification for public review and comment. If the vote is affirmative, the SNIA Technical Council will have to meet to determine when and if to release the "Trial Use" specification. This may take a few months, so we are probably looking at March for full release. Feel free to share this information with your Linux contacts. So, for now, if you're in SNIA, you can get access to the draft spec, and in a few months the draft spec should be publicly available. Thanks, Matt -- Matt Domsch Sr. Software Engineer, Lead Engineer Dell Linux Solutions www.dell.com/linux Linux on Dell mailing lists @ http://lists.us.dell.com ^ permalink raw reply [flat|nested] 33+ messages in thread
end of thread, other threads:[~2004-01-16 14:13 UTC | newest]
Thread overview: 33+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <40033D02.8000207@adaptec.com>
2004-01-13 18:44 ` Proposed enhancements to MD Jeff Garzik
2004-01-13 19:01 ` John Bradford
2004-01-13 19:41 ` Matt Domsch
2004-01-13 22:10 ` Arjan van de Ven
2004-01-16 9:31 ` Lars Marowsky-Bree
2004-01-16 9:57 ` Arjan van de Ven
2004-01-13 20:41 ` Scott Long
2004-01-13 22:33 ` Jure Pečar
2004-01-13 22:44 ` Scott Long
2004-01-13 22:56 ` viro
2004-01-14 15:52 ` Kevin Corry
2004-01-13 22:42 ` Luca Berra
2004-01-14 23:07 ` Neil Brown
2004-01-15 11:10 ` Norman Schmidt
2004-01-15 21:52 ` Matt Domsch
2004-01-16 9:24 ` Lars Marowsky-Bree
2004-01-16 13:43 ` Matt Domsch
2004-01-16 13:56 ` Lars Marowsky-Bree
2004-01-16 14:06 ` Christoph Hellwig
2004-01-16 14:11 ` Matt Domsch
2004-01-16 14:13 ` Christoph Hellwig
2004-01-13 3:41 Proposed Enhancements " Scott Long
2004-01-13 10:24 ` Lars Marowsky-Bree
2004-01-13 18:03 ` Scott Long
2004-01-16 9:29 ` Lars Marowsky-Bree
2004-01-13 14:19 ` Matt Domsch
2004-01-13 17:13 ` Andreas Dilger
2004-01-13 22:26 ` Andreas Dilger
2004-01-13 18:19 ` Kevin P. Fleming
2004-01-13 18:19 ` Jeff Garzik
2004-01-13 20:29 ` Chris Friesen
2004-01-13 20:35 ` Matt Domsch
2004-01-13 21:10 ` Matt Domsch
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).