* RAID5 reconstruction ? @ 2009-05-30 5:44 SandeepKsinha 2009-05-30 12:52 ` Sujit Karataparambil 2009-05-30 13:35 ` John Robinson 0 siblings, 2 replies; 22+ messages in thread From: SandeepKsinha @ 2009-05-30 5:44 UTC (permalink / raw) To: Linux RAID Hi all, Say If I have a RAID 5 array of 50GB of five disks of 10GB each. I have data of 5GB. When a disk fails and replaced with a spare disk. Will the reconstruction happen only for the 5GB allocated disk blocks or it will happen for the whole disk size. Is it possible to make reconstruction intelligent enough to keep it optimized ? Thanks. -- Regards, Sandeep. “To learn is to change. Education is a process that changes the learner.” -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: RAID5 reconstruction ? 2009-05-30 5:44 RAID5 reconstruction ? SandeepKsinha @ 2009-05-30 12:52 ` Sujit Karataparambil 2009-05-30 13:28 ` SandeepKsinha 2009-06-09 4:13 ` Nifty Fedora Mitch 2009-05-30 13:35 ` John Robinson 1 sibling, 2 replies; 22+ messages in thread From: Sujit Karataparambil @ 2009-05-30 12:52 UTC (permalink / raw) To: SandeepKsinha; +Cc: Linux RAID Are you talking about hardware raid or software raid. I donot think hardware raid can by itself do much of the book keeping. Software Raid I think is different from having raid 0 to raid 1 to raid 2 to raid 3 to raid 4 to raid 5. hope this is correct information. Thanks, On Sat, May 30, 2009 at 11:14 AM, SandeepKsinha <sandeepksinha@gmail.com> wrote: > Hi all, > > Say If I have a RAID 5 array of 50GB of five disks of 10GB each. > > I have data of 5GB. When a disk fails and replaced with a spare disk. > Will the reconstruction happen only for the 5GB allocated disk blocks > or it will happen for the whole disk size. > > > Is it possible to make reconstruction intelligent enough to keep it optimized ? > > Thanks. > > -- > Regards, > Sandeep. > > > > > > > “To learn is to change. Education is a process that changes the learner.” > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- -- Sujit K M -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: RAID5 reconstruction ? 2009-05-30 12:52 ` Sujit Karataparambil @ 2009-05-30 13:28 ` SandeepKsinha 2009-05-30 13:31 ` Sujit Karataparambil 2009-06-09 4:13 ` Nifty Fedora Mitch 1 sibling, 1 reply; 22+ messages in thread From: SandeepKsinha @ 2009-05-30 13:28 UTC (permalink / raw) To: Sujit Karataparambil; +Cc: Linux RAID I On Sat, May 30, 2009 at 6:22 PM, Sujit Karataparambil <sjt.kar@gmail.com> wrote: > Are you talking about hardware raid or software raid. > > I donot think hardware raid can by itself do much of the > book keeping. > > Software Raid I think is different from having raid 0 to raid 1 > to raid 2 to raid 3 to raid 4 to raid 5. > I am taking about mdraid. > hope this is correct information. > > Thanks, > > > On Sat, May 30, 2009 at 11:14 AM, SandeepKsinha <sandeepksinha@gmail.com> wrote: >> Hi all, >> >> Say If I have a RAID 5 array of 50GB of five disks of 10GB each. >> >> I have data of 5GB. When a disk fails and replaced with a spare disk. >> Will the reconstruction happen only for the 5GB allocated disk blocks >> or it will happen for the whole disk size. >> >> >> Is it possible to make reconstruction intelligent enough to keep it optimized ? >> >> Thanks. >> >> -- >> Regards, >> Sandeep. >> >> >> >> >> >> >> “To learn is to change. Education is a process that changes the learner.” >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > > > > -- > -- Sujit K M > -- Regards, Sandeep. “To learn is to change. Education is a process that changes the learner.” -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: RAID5 reconstruction ? 2009-05-30 13:28 ` SandeepKsinha @ 2009-05-30 13:31 ` Sujit Karataparambil 0 siblings, 0 replies; 22+ messages in thread From: Sujit Karataparambil @ 2009-05-30 13:31 UTC (permalink / raw) To: SandeepKsinha; +Cc: Linux RAID Sorry could you tell us what you are looking for? On Sat, May 30, 2009 at 6:58 PM, SandeepKsinha <sandeepksinha@gmail.com> wrote: > I > > On Sat, May 30, 2009 at 6:22 PM, Sujit Karataparambil <sjt.kar@gmail.com> wrote: >> Are you talking about hardware raid or software raid. >> >> I donot think hardware raid can by itself do much of the >> book keeping. >> >> Software Raid I think is different from having raid 0 to raid 1 >> to raid 2 to raid 3 to raid 4 to raid 5. >> > > I am taking about mdraid. > >> hope this is correct information. >> >> Thanks, >> >> >> On Sat, May 30, 2009 at 11:14 AM, SandeepKsinha <sandeepksinha@gmail.com> wrote: >>> Hi all, >>> >>> Say If I have a RAID 5 array of 50GB of five disks of 10GB each. >>> >>> I have data of 5GB. When a disk fails and replaced with a spare disk. >>> Will the reconstruction happen only for the 5GB allocated disk blocks >>> or it will happen for the whole disk size. >>> >>> >>> Is it possible to make reconstruction intelligent enough to keep it optimized ? >>> >>> Thanks. >>> >>> -- >>> Regards, >>> Sandeep. >>> >>> >>> >>> >>> >>> >>> “To learn is to change. Education is a process that changes the learner.” >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> >> >> >> >> -- >> -- Sujit K M >> > > > > -- > Regards, > Sandeep. > > > > > > > “To learn is to change. Education is a process that changes the learner.” > -- -- Sujit K M -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: RAID5 reconstruction ? 2009-05-30 12:52 ` Sujit Karataparambil 2009-05-30 13:28 ` SandeepKsinha @ 2009-06-09 4:13 ` Nifty Fedora Mitch 1 sibling, 0 replies; 22+ messages in thread From: Nifty Fedora Mitch @ 2009-06-09 4:13 UTC (permalink / raw) To: Sujit Karataparambil; +Cc: SandeepKsinha, Linux RAID On Sat, May 30, 2009 at 06:22:40PM +0530, Sujit Karataparambil wrote: > > Are you talking about hardware raid or software raid. > > I donot think hardware raid can by itself do much of the > book keeping. > > Software Raid I think is different from having raid 0 to raid 1 > to raid 2 to raid 3 to raid 4 to raid 5. > > hope this is correct information. > > Thanks, > > > On Sat, May 30, 2009 at 11:14 AM, SandeepKsinha <sandeepksinha@gmail.com> wrote: > > Hi all, > > > > Say If I have a RAID 5 array of 50GB of five disks of 10GB each. > > > > I have data of 5GB. When a disk fails and replaced with a spare disk. > > Will the reconstruction happen only for the 5GB allocated disk blocks > > or it will happen for the whole disk size. > > > > Is it possible to make reconstruction intelligent enough to keep it optimized ? > > > > Sandeep. Sandeep, It might help if you think of your 50GB raid as a single disk with special properties of redundancy and management. A single disk has no "knowledge" about the file system it contains. As such it will not know anything about the file system data allocation and will be "dumb" in terms of what blocks/ stripes/ whatever it needs to reconstruct/ repair itself with relation to the file system. It is possible that unallocated file system regions will still be zeroed blocks and recovery is quick. It is also possible that the RAID data structures will remember if it has been read or written to and needs to be recovered/ initialized. To the best of my knowledge RAID-N from vendor to vendor; hardware .vs. software only defines the type and general patterns of redundancy. I.E. I do not expect that I can pull a disk set from one raid vendor and expect another vendor to understand and present the logical blocks in the same identical way. To set expectations the recovery of a 50GB RAID with 0.1 GB of data in its file system or with 45GB of data in its filesystem will be the same unless the specific RAID implementation keeps a score card. More apropos the RAID does not know anything special except at initialization -- and an old RAID incontrast to a new filesystem will want to recover the entire RAID (itself) knowing nothing about the history of the filesystems it contains. Others may be able to add more details for your specific RAID. -- T o m M i t c h e l l Found me a new hat, now what? -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: RAID5 reconstruction ? 2009-05-30 5:44 RAID5 reconstruction ? SandeepKsinha 2009-05-30 12:52 ` Sujit Karataparambil @ 2009-05-30 13:35 ` John Robinson 2009-05-30 14:06 ` Maxime Boissonneault 2009-05-30 16:08 ` Redeeman 1 sibling, 2 replies; 22+ messages in thread From: John Robinson @ 2009-05-30 13:35 UTC (permalink / raw) To: SandeepKsinha; +Cc: Linux RAID On 30/05/2009 06:44, SandeepKsinha wrote: > Hi all, > > Say If I have a RAID 5 array of 50GB of five disks of 10GB each. > > I have data of 5GB. When a disk fails and replaced with a spare disk. > Will the reconstruction happen only for the 5GB allocated disk blocks > or it will happen for the whole disk size. The whole disc size, for now anyway; md does not currently note which blocks have been used by its client (the filesystem, LVM, whatever). > Is it possible to make reconstruction intelligent enough to keep it optimized ? This has been discussed in combination with supporting SSD drives' TRIM function, and would mean md had to keep track of used chunks or possibly even sectors using a bitmap or something like that, but whether anyone's working on it I don't know. Cheers, John. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: RAID5 reconstruction ? 2009-05-30 13:35 ` John Robinson @ 2009-05-30 14:06 ` Maxime Boissonneault 2009-05-30 15:46 ` John Robinson 2009-05-30 16:08 ` Redeeman 1 sibling, 1 reply; 22+ messages in thread From: Maxime Boissonneault @ 2009-05-30 14:06 UTC (permalink / raw) To: John Robinson; +Cc: SandeepKsinha, Linux RAID John Robinson a écrit : > On 30/05/2009 06:44, SandeepKsinha wrote: >> Hi all, >> >> Say If I have a RAID 5 array of 50GB of five disks of 10GB each. >> >> I have data of 5GB. When a disk fails and replaced with a spare disk. >> Will the reconstruction happen only for the 5GB allocated disk blocks >> or it will happen for the whole disk size. > > The whole disc size, for now anyway; md does not currently note which > blocks have been used by its client (the filesystem, LVM, whatever). > >> Is it possible to make reconstruction intelligent enough to keep it >> optimized ? > > This has been discussed in combination with supporting SSD drives' > TRIM function, and would mean md had to keep track of used chunks or > possibly even sectors using a bitmap or something like that, but > whether anyone's working on it I don't know. > I don't know how it goes for Linux, but hasn't ZFS been developped exactly for that purpose ? From what I understand, ZFS manage both the file system and the RAID features at once. Therefore, the "raid" part knows where are the files and the filesystem knows about the raid. Reconstruction is then done intelligently (not reconstructing unused space). Maxime Boissonneault -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: RAID5 reconstruction ? 2009-05-30 14:06 ` Maxime Boissonneault @ 2009-05-30 15:46 ` John Robinson 2009-05-30 16:16 ` Maxime Boissonneault 0 siblings, 1 reply; 22+ messages in thread From: John Robinson @ 2009-05-30 15:46 UTC (permalink / raw) To: Maxime Boissonneault; +Cc: Linux RAID On 30/05/2009 15:06, Maxime Boissonneault wrote: > I don't know how it goes for Linux, but hasn't ZFS been developped > exactly for that purpose ? From what I understand, ZFS manage both the > file system and the RAID features at once. Therefore, the "raid" part > knows where are the files and the filesystem knows about the raid. > Reconstruction is then done intelligently (not reconstructing unused > space). Yes, ZFS can do this, but ext4 and md could do it too, and probably will at some point. I prefer separating the layers, then I can switch one and not the other, e.g. start using a hardware RAID controller instead of md. Cheers, John. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: RAID5 reconstruction ? 2009-05-30 15:46 ` John Robinson @ 2009-05-30 16:16 ` Maxime Boissonneault 2009-05-30 16:30 ` John Robinson 0 siblings, 1 reply; 22+ messages in thread From: Maxime Boissonneault @ 2009-05-30 16:16 UTC (permalink / raw) To: John Robinson; +Cc: Linux RAID John Robinson a écrit : > On 30/05/2009 15:06, Maxime Boissonneault wrote: >> I don't know how it goes for Linux, but hasn't ZFS been developped >> exactly for that purpose ? From what I understand, ZFS manage both >> the file system and the RAID features at once. Therefore, the "raid" >> part knows where are the files and the filesystem knows about the >> raid. Reconstruction is then done intelligently (not reconstructing >> unused space). > > Yes, ZFS can do this, but ext4 and md could do it too, and probably > will at some point. I prefer separating the layers, then I can switch > one and not the other, e.g. start using a hardware RAID controller > instead of md. > > Cheers, > > John. > Chances are that the hardware controler would not be compatible with mdadm and you would have to backup and copy your data anyway. -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: RAID5 reconstruction ? 2009-05-30 16:16 ` Maxime Boissonneault @ 2009-05-30 16:30 ` John Robinson 0 siblings, 0 replies; 22+ messages in thread From: John Robinson @ 2009-05-30 16:30 UTC (permalink / raw) To: Maxime Boissonneault; +Cc: Linux RAID On 30/05/2009 17:16, Maxime Boissonneault wrote: > John Robinson a écrit : >> On 30/05/2009 15:06, Maxime Boissonneault wrote: >>> I don't know how it goes for Linux, but hasn't ZFS been developped >>> exactly for that purpose ? From what I understand, ZFS manage both >>> the file system and the RAID features at once. Therefore, the "raid" >>> part knows where are the files and the filesystem knows about the >>> raid. Reconstruction is then done intelligently (not reconstructing >>> unused space). >> >> Yes, ZFS can do this, but ext4 and md could do it too, and probably >> will at some point. I prefer separating the layers, then I can switch >> one and not the other, e.g. start using a hardware RAID controller >> instead of md. >> > Chances are that the hardware controler would not be compatible with > mdadm and you would have to backup and copy your data anyway. Sure, but I can keep my filesystem, which I've chosen for whatever filesystem features it has, rather than having chosen it because it does RAID itself. It would be nice if RAID controllers and md used the same metadata, though, wouldn't it, so we could swap discs between controllers and everything would Just Work? Umm, do any RAID controllers support SNIA DDF? I think md does (or soon will)... Cheers, John. -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: RAID5 reconstruction ? 2009-05-30 13:35 ` John Robinson 2009-05-30 14:06 ` Maxime Boissonneault @ 2009-05-30 16:08 ` Redeeman 2009-05-30 18:39 ` Bill Davidsen ` (2 more replies) 1 sibling, 3 replies; 22+ messages in thread From: Redeeman @ 2009-05-30 16:08 UTC (permalink / raw) To: John Robinson; +Cc: SandeepKsinha, Linux RAID On Sat, 2009-05-30 at 14:35 +0100, John Robinson wrote: > On 30/05/2009 06:44, SandeepKsinha wrote: > > Hi all, > > > > Say If I have a RAID 5 array of 50GB of five disks of 10GB each. > > > > I have data of 5GB. When a disk fails and replaced with a spare disk. > > Will the reconstruction happen only for the 5GB allocated disk blocks > > or it will happen for the whole disk size. > > The whole disc size, for now anyway; md does not currently note which > blocks have been used by its client (the filesystem, LVM, whatever). > > > Is it possible to make reconstruction intelligent enough to keep it optimized ? > > This has been discussed in combination with supporting SSD drives' TRIM > function, and would mean md had to keep track of used chunks or possibly > even sectors using a bitmap or something like that, but whether anyone's > working on it I don't know. I would say it should be possible to 'query' the filesystem for that information. Obviously this will only work if you run a filesystem on it which supports it, but it would seem like a nicer solution than a bitmap for it. > > Cheers, > > John. > > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: RAID5 reconstruction ? 2009-05-30 16:08 ` Redeeman @ 2009-05-30 18:39 ` Bill Davidsen 2009-05-30 18:54 ` Goswin von Brederlow 2009-05-30 18:55 ` Goswin von Brederlow 2 siblings, 0 replies; 22+ messages in thread From: Bill Davidsen @ 2009-05-30 18:39 UTC (permalink / raw) To: Redeeman; +Cc: John Robinson, SandeepKsinha, Linux RAID Redeeman wrote: > On Sat, 2009-05-30 at 14:35 +0100, John Robinson wrote: > >> On 30/05/2009 06:44, SandeepKsinha wrote: >> >>> Hi all, >>> >>> Say If I have a RAID 5 array of 50GB of five disks of 10GB each. >>> >>> I have data of 5GB. When a disk fails and replaced with a spare disk. >>> Will the reconstruction happen only for the 5GB allocated disk blocks >>> or it will happen for the whole disk size. >>> >> The whole disc size, for now anyway; md does not currently note which >> blocks have been used by its client (the filesystem, LVM, whatever). >> >> >>> Is it possible to make reconstruction intelligent enough to keep it optimized ? >>> >> This has been discussed in combination with supporting SSD drives' TRIM >> function, and would mean md had to keep track of used chunks or possibly >> even sectors using a bitmap or something like that, but whether anyone's >> working on it I don't know. >> > > I would say it should be possible to 'query' the filesystem for that > information. Obviously this will only work if you run a filesystem on it > which supports it, but it would seem like a nicer solution than a bitmap > for it. > There is a program which does that, but i think it was for ext2. Read the inodes and saved the data. I believe 'dump' does something similar, but it's on a file system basis and I don't recall (or care much about) the details. -- bill davidsen <davidsen@tmr.com> CTO TMR Associates, Inc "You are disgraced professional losers. And by the way, give us our money back." - Representative Earl Pomeroy, Democrat of North Dakota on the A.I.G. executives who were paid bonuses after a federal bailout. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: RAID5 reconstruction ? 2009-05-30 16:08 ` Redeeman 2009-05-30 18:39 ` Bill Davidsen @ 2009-05-30 18:54 ` Goswin von Brederlow 2009-05-31 8:10 ` SandeepKsinha 2009-05-30 18:55 ` Goswin von Brederlow 2 siblings, 1 reply; 22+ messages in thread From: Goswin von Brederlow @ 2009-05-30 18:54 UTC (permalink / raw) To: Redeeman; +Cc: John Robinson, SandeepKsinha, Linux RAID Redeeman <redeeman@metanurb.dk> writes: > On Sat, 2009-05-30 at 14:35 +0100, John Robinson wrote: >> On 30/05/2009 06:44, SandeepKsinha wrote: >> > Hi all, >> > >> > Say If I have a RAID 5 array of 50GB of five disks of 10GB each. >> > >> > I have data of 5GB. When a disk fails and replaced with a spare disk. >> > Will the reconstruction happen only for the 5GB allocated disk blocks >> > or it will happen for the whole disk size. >> >> The whole disc size, for now anyway; md does not currently note which >> blocks have been used by its client (the filesystem, LVM, whatever). >> >> > Is it possible to make reconstruction intelligent enough to keep it optimized ? >> >> This has been discussed in combination with supporting SSD drives' TRIM >> function, and would mean md had to keep track of used chunks or possibly >> even sectors using a bitmap or something like that, but whether anyone's >> working on it I don't know. > > I would say it should be possible to 'query' the filesystem for that > information. Obviously this will only work if you run a filesystem on it > which supports it, but it would seem like a nicer solution than a bitmap > for it. On the other hand checking a bitmap is quick. You could use the bitmap not only for reconstruction but also for reads. If the bitmap say the block is unused you can skip the read and just zero fill the buffer. This would speed up reads and writes that don't cover the full stripe. And compared to a raid with current bitmap there shouldn't be any real slowdown for the extra "unused" bit. The only special case for resync would be that if all data blocks in a stripe are unused then the parity blocks can be marked unused too. MfG Goswin ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: RAID5 reconstruction ? 2009-05-30 18:54 ` Goswin von Brederlow @ 2009-05-31 8:10 ` SandeepKsinha 0 siblings, 0 replies; 22+ messages in thread From: SandeepKsinha @ 2009-05-31 8:10 UTC (permalink / raw) To: Goswin von Brederlow; +Cc: Redeeman, John Robinson, Linux RAID On Sun, May 31, 2009 at 12:24 AM, Goswin von Brederlow <goswin-v-b@web.de> wrote: > Redeeman <redeeman@metanurb.dk> writes: > >> On Sat, 2009-05-30 at 14:35 +0100, John Robinson wrote: >>> On 30/05/2009 06:44, SandeepKsinha wrote: >>> > Hi all, >>> > >>> > Say If I have a RAID 5 array of 50GB of five disks of 10GB each. >>> > >>> > I have data of 5GB. When a disk fails and replaced with a spare disk. >>> > Will the reconstruction happen only for the 5GB allocated disk blocks >>> > or it will happen for the whole disk size. >>> >>> The whole disc size, for now anyway; md does not currently note which >>> blocks have been used by its client (the filesystem, LVM, whatever). >>> >>> > Is it possible to make reconstruction intelligent enough to keep it optimized ? >>> >>> This has been discussed in combination with supporting SSD drives' TRIM >>> function, and would mean md had to keep track of used chunks or possibly >>> even sectors using a bitmap or something like that, but whether anyone's >>> working on it I don't know. >> >> I would say it should be possible to 'query' the filesystem for that >> information. Obviously this will only work if you run a filesystem on it >> which supports it, but it would seem like a nicer solution than a bitmap >> for it. > You have put a big constraint here of "filesystem which supports it". In general, I have not known any file system work as yet which leverages the underlying device topology to optimize its block allocation policies for enhaced I/O, etc. Or for any other reasons. Having a bitmap can surely have lot of other benefits too. Looking at the drive sizes in recent times, think of situation where you have to do a reconstruction or resysnc. It might take months for them to complete. Also, in the meanwhile you will have degraded I/O's. Just in worst case, if your drive has most of allocated blocks, it will be a penalty. > On the other hand checking a bitmap is quick. You could use the bitmap > not only for reconstruction but also for reads. If the bitmap say the > block is unused you can skip the read and just zero fill the > buffer. This would speed up reads and writes that don't cover the full > stripe. And compared to a raid with current bitmap there shouldn't be > any real slowdown for the extra "unused" bit. > > The only special case for resync would be that if all data blocks in a > stripe are unused then the parity blocks can be marked unused too. > > MfG > Goswin > -- Regards, Sandeep. “To learn is to change. Education is a process that changes the learner.” -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: RAID5 reconstruction ? 2009-05-30 16:08 ` Redeeman 2009-05-30 18:39 ` Bill Davidsen 2009-05-30 18:54 ` Goswin von Brederlow @ 2009-05-30 18:55 ` Goswin von Brederlow 2009-05-30 19:37 ` Redeeman 2009-06-02 18:42 ` Bill Davidsen 2 siblings, 2 replies; 22+ messages in thread From: Goswin von Brederlow @ 2009-05-30 18:55 UTC (permalink / raw) To: Redeeman; +Cc: John Robinson, SandeepKsinha, Linux RAID Redeeman <redeeman@metanurb.dk> writes: > On Sat, 2009-05-30 at 14:35 +0100, John Robinson wrote: >> On 30/05/2009 06:44, SandeepKsinha wrote: >> > Hi all, >> > >> > Say If I have a RAID 5 array of 50GB of five disks of 10GB each. >> > >> > I have data of 5GB. When a disk fails and replaced with a spare disk. >> > Will the reconstruction happen only for the 5GB allocated disk blocks >> > or it will happen for the whole disk size. >> >> The whole disc size, for now anyway; md does not currently note which >> blocks have been used by its client (the filesystem, LVM, whatever). >> >> > Is it possible to make reconstruction intelligent enough to keep it optimized ? >> >> This has been discussed in combination with supporting SSD drives' TRIM >> function, and would mean md had to keep track of used chunks or possibly >> even sectors using a bitmap or something like that, but whether anyone's >> working on it I don't know. > > I would say it should be possible to 'query' the filesystem for that > information. Obviously this will only work if you run a filesystem on it > which supports it, but it would seem like a nicer solution than a bitmap > for it. > >> >> Cheers, >> >> John. And just when I hit send I thought of something else. Instead of the initial sync when creating a raid the bitmap could just mark all blocks as unused. Much faster raid creation. MfG Goswin ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: RAID5 reconstruction ? 2009-05-30 18:55 ` Goswin von Brederlow @ 2009-05-30 19:37 ` Redeeman 2009-05-31 8:02 ` SandeepKsinha 2009-06-02 18:42 ` Bill Davidsen 1 sibling, 1 reply; 22+ messages in thread From: Redeeman @ 2009-05-30 19:37 UTC (permalink / raw) To: Goswin von Brederlow; +Cc: John Robinson, SandeepKsinha, Linux RAID On Sat, 2009-05-30 at 20:55 +0200, Goswin von Brederlow wrote: > Redeeman <redeeman@metanurb.dk> writes: > > > On Sat, 2009-05-30 at 14:35 +0100, John Robinson wrote: > >> On 30/05/2009 06:44, SandeepKsinha wrote: > >> > Hi all, > >> > > >> > Say If I have a RAID 5 array of 50GB of five disks of 10GB each. > >> > > >> > I have data of 5GB. When a disk fails and replaced with a spare disk. > >> > Will the reconstruction happen only for the 5GB allocated disk blocks > >> > or it will happen for the whole disk size. > >> > >> The whole disc size, for now anyway; md does not currently note which > >> blocks have been used by its client (the filesystem, LVM, whatever). > >> > >> > Is it possible to make reconstruction intelligent enough to keep it optimized ? > >> > >> This has been discussed in combination with supporting SSD drives' TRIM > >> function, and would mean md had to keep track of used chunks or possibly > >> even sectors using a bitmap or something like that, but whether anyone's > >> working on it I don't know. > > > > I would say it should be possible to 'query' the filesystem for that > > information. Obviously this will only work if you run a filesystem on it > > which supports it, but it would seem like a nicer solution than a bitmap > > for it. > > > >> > >> Cheers, > >> > >> John. > > And just when I hit send I thought of something else. > > Instead of the initial sync when creating a raid the bitmap could just > mark all blocks as unused. Much faster raid creation. A filesystem-coexist mode could also do this, by simply refusing operation until such a time that a filesystem is detected, or i suppose in worst case, mounted... > > MfG > Goswin > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: RAID5 reconstruction ? 2009-05-30 19:37 ` Redeeman @ 2009-05-31 8:02 ` SandeepKsinha 2009-05-31 11:54 ` Goswin von Brederlow 0 siblings, 1 reply; 22+ messages in thread From: SandeepKsinha @ 2009-05-31 8:02 UTC (permalink / raw) To: Redeeman; +Cc: Goswin von Brederlow, John Robinson, Linux RAID On Sun, May 31, 2009 at 1:07 AM, Redeeman <redeeman@metanurb.dk> wrote: > On Sat, 2009-05-30 at 20:55 +0200, Goswin von Brederlow wrote: >> Redeeman <redeeman@metanurb.dk> writes: >> >> > On Sat, 2009-05-30 at 14:35 +0100, John Robinson wrote: >> >> On 30/05/2009 06:44, SandeepKsinha wrote: >> >> > Hi all, >> >> > >> >> > Say If I have a RAID 5 array of 50GB of five disks of 10GB each. >> >> > >> >> > I have data of 5GB. When a disk fails and replaced with a spare disk. >> >> > Will the reconstruction happen only for the 5GB allocated disk blocks >> >> > or it will happen for the whole disk size. >> >> >> >> The whole disc size, for now anyway; md does not currently note which >> >> blocks have been used by its client (the filesystem, LVM, whatever). >> >> >> >> > Is it possible to make reconstruction intelligent enough to keep it optimized ? >> >> >> >> This has been discussed in combination with supporting SSD drives' TRIM >> >> function, and would mean md had to keep track of used chunks or possibly >> >> even sectors using a bitmap or something like that, but whether anyone's >> >> working on it I don't know. >> > >> > I would say it should be possible to 'query' the filesystem for that >> > information. Obviously this will only work if you run a filesystem on it >> > which supports it, but it would seem like a nicer solution than a bitmap >> > for it. >> > >> >> >> >> Cheers, >> >> >> >> John. >> >> And just when I hit send I thought of something else. >> >> Instead of the initial sync when creating a raid the bitmap could just >> mark all blocks as unused. Much faster raid creation. > This really sounds like a good option. This would have a slight hit for writes which I believe will compensate for later re-constructions, replacing a disk, mirror resysnc and many more operation. Neil any comments on this? I mean, how difficult would it be to maintain such a bitmap. This can though of making something optional as this will incur space. > A filesystem-coexist mode could also do this, by simply refusing > operation until such a time that a filesystem is detected, or i suppose > in worst case, mounted... > >> >> MfG >> Goswin >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- Regards, Sandeep. “To learn is to change. Education is a process that changes the learner.” -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: RAID5 reconstruction ? 2009-05-31 8:02 ` SandeepKsinha @ 2009-05-31 11:54 ` Goswin von Brederlow 2009-05-31 12:11 ` John Robinson 2009-05-31 12:14 ` NeilBrown 0 siblings, 2 replies; 22+ messages in thread From: Goswin von Brederlow @ 2009-05-31 11:54 UTC (permalink / raw) To: SandeepKsinha; +Cc: Redeeman, Goswin von Brederlow, John Robinson, Linux RAID SandeepKsinha <sandeepksinha@gmail.com> writes: > On Sun, May 31, 2009 at 1:07 AM, Redeeman <redeeman@metanurb.dk> wrote: >> On Sat, 2009-05-30 at 20:55 +0200, Goswin von Brederlow wrote: >>> And just when I hit send I thought of something else. >>> >>> Instead of the initial sync when creating a raid the bitmap could just >>> mark all blocks as unused. Much faster raid creation. >> > > > This really sounds like a good option. This would have a slight hit > for writes which I believe will compensate for later re-constructions, > replacing a disk, mirror resysnc and many more operation. What hit? Currently with bitmap support a write will set the block to "unclean", write the data, write the parity and set the block to "clean". Setting the "used" bit along the way should not cost much. Only difference I see is that the bitmap would have to have finer granularity so one "used" bit covers one filesystem block (4k usualy). Otherwise you could only "use" blocks but not "unuse" them again when the filesystem frees them in 4k chunks. > Neil any comments on this? > > I mean, how difficult would it be to maintain such a bitmap. This can > though of making something optional as this will incur space. > > >> A filesystem-coexist mode could also do this, by simply refusing >> operation until such a time that a filesystem is detected, or i suppose >> in worst case, mounted... > > >> >>> >>> MfG >>> Goswin MfG Goswin -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: RAID5 reconstruction ? 2009-05-31 11:54 ` Goswin von Brederlow @ 2009-05-31 12:11 ` John Robinson 2009-05-31 12:14 ` NeilBrown 1 sibling, 0 replies; 22+ messages in thread From: John Robinson @ 2009-05-31 12:11 UTC (permalink / raw) To: Linux RAID On 31/05/2009 12:54, Goswin von Brederlow wrote: > SandeepKsinha <sandeepksinha@gmail.com> writes: >>> On Sat, 2009-05-30 at 20:55 +0200, Goswin von Brederlow wrote: >>>> And just when I hit send I thought of something else. >>>> >>>> Instead of the initial sync when creating a raid the bitmap could just >>>> mark all blocks as unused. Much faster raid creation. >> >> This really sounds like a good option. This would have a slight hit >> for writes which I believe will compensate for later re-constructions, >> replacing a disk, mirror resysnc and many more operation. > > What hit? Currently with bitmap support a write will set the block to > "unclean", write the data, write the parity and set the block to > "clean". Setting the "used" bit along the way should not cost much. > > Only difference I see is that the bitmap would have to have finer > granularity so one "used" bit covers one filesystem block (4k usualy). > Otherwise you could only "use" blocks but not "unuse" them again when > the filesystem frees them in 4k chunks. I think the whole thing probably ought to be done in such a way as to support the pass-down and pass-through of TRIM/DISCARD commands, which I vaguely recall from previous discussions operate at sector granularity. The idea would be for md to be able to use a bitmap (or other some other data structure for a free/used block/sector list) when operating over devices which don't support TRIM/DISCARD themselves, but take advantage of the devices' own capability when it's there - and since it'll be SSDs, we'd want to avoid repeatedly rewriting a bitmap since the point of TRIM/DISCARD is to help SSDs manage wear levelling. I am assuming that devices supporting TRIM/DISCARD are able to indicate whether a given sector is used or free; if they don't and just return arbitary data we would have to keep a bitmap (or whatever) in md to be able to support TRIM/DISCARD at all. Of course any bitmap (or whatever) might still be optimised if we know md and its clients never use anything smaller than e.g. 4k. Cheers, John. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: RAID5 reconstruction ? 2009-05-31 11:54 ` Goswin von Brederlow 2009-05-31 12:11 ` John Robinson @ 2009-05-31 12:14 ` NeilBrown 2009-06-03 1:54 ` Greg Freemyer 1 sibling, 1 reply; 22+ messages in thread From: NeilBrown @ 2009-05-31 12:14 UTC (permalink / raw) Cc: SandeepKsinha, Redeeman, Goswin von Brederlow, John Robinson, Linux RAID On Sun, May 31, 2009 9:54 pm, Goswin von Brederlow wrote: > SandeepKsinha <sandeepksinha@gmail.com> writes: > >> On Sun, May 31, 2009 at 1:07 AM, Redeeman <redeeman@metanurb.dk> wrote: >>> On Sat, 2009-05-30 at 20:55 +0200, Goswin von Brederlow wrote: >>>> And just when I hit send I thought of something else. >>>> >>>> Instead of the initial sync when creating a raid the bitmap could just >>>> mark all blocks as unused. Much faster raid creation. >>> >> >> >> This really sounds like a good option. This would have a slight hit >> for writes which I believe will compensate for later re-constructions, >> replacing a disk, mirror resysnc and many more operation. > > What hit? Currently with bitmap support a write will set the block to > "unclean", write the data, write the parity and set the block to > "clean". Setting the "used" bit along the way should not cost much. > > Only difference I see is that the bitmap would have to have finer > granularity so one "used" bit covers one filesystem block (4k usualy). > Otherwise you could only "use" blocks but not "unuse" them again when > the filesystem frees them in 4k chunks. But the filesystem could "unuse" blocks in larger chunks. There is this thing called "thin provisioning" and I believe the proponents of that would like the "TRIM" command to be sent in aligned multiples of 1Gigabyte or something like that. I believe this is one aspect of Linux TRIM support that is still open. I think there would be real value in providing an 'allocated' bitmap even if it were quite coarsely grained. The problem with a very large grain is that every time you set a bit, you need to resync that region, and you don't want that to take too long. So 1 gig (10-30seconds?) would be an upper limit I would thing. If you used 1 sector for the bitmap, that is 4096 bits so on a terabyte array, you have 256Meg chunks that resync in a few seconds. Certainly an interesting idea to experiment with I think. NeilBrown > >> Neil any comments on this? >> >> I mean, how difficult would it be to maintain such a bitmap. This can >> though of making something optional as this will incur space. >> >> >>> A filesystem-coexist mode could also do this, by simply refusing >>> operation until such a time that a filesystem is detected, or i suppose >>> in worst case, mounted... >> >> >>> >>>> >>>> MfG >>>> Goswin > > MfG > Goswin > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: RAID5 reconstruction ? 2009-05-31 12:14 ` NeilBrown @ 2009-06-03 1:54 ` Greg Freemyer 0 siblings, 0 replies; 22+ messages in thread From: Greg Freemyer @ 2009-06-03 1:54 UTC (permalink / raw) To: NeilBrown Cc: Goswin von Brederlow, SandeepKsinha, Redeeman, John Robinson, Linux RAID On Sun, May 31, 2009 at 8:14 AM, NeilBrown <neilb@suse.de> wrote: > On Sun, May 31, 2009 9:54 pm, Goswin von Brederlow wrote: >> SandeepKsinha <sandeepksinha@gmail.com> writes: >> >>> On Sun, May 31, 2009 at 1:07 AM, Redeeman <redeeman@metanurb.dk> wrote: >>>> On Sat, 2009-05-30 at 20:55 +0200, Goswin von Brederlow wrote: >>>>> And just when I hit send I thought of something else. >>>>> >>>>> Instead of the initial sync when creating a raid the bitmap could just >>>>> mark all blocks as unused. Much faster raid creation. >>>> >>> >>> >>> This really sounds like a good option. This would have a slight hit >>> for writes which I believe will compensate for later re-constructions, >>> replacing a disk, mirror resysnc and many more operation. >> >> What hit? Currently with bitmap support a write will set the block to >> "unclean", write the data, write the parity and set the block to >> "clean". Setting the "used" bit along the way should not cost much. >> >> Only difference I see is that the bitmap would have to have finer >> granularity so one "used" bit covers one filesystem block (4k usualy). >> Otherwise you could only "use" blocks but not "unuse" them again when >> the filesystem frees them in 4k chunks. > > But the filesystem could "unuse" blocks in larger chunks. > There is this thing called "thin provisioning" and I believe the proponents > of that would like the "TRIM" command to be sent in aligned multiples > of 1Gigabyte or something like that. > I believe this is one aspect of Linux TRIM support that is still open. > > I think there would be real value in providing an 'allocated' > bitmap even if it were quite coarsely grained. > The problem with a very large grain is that every time you set a bit, > you need to resync that region, and you don't want that to take too long. > So 1 gig (10-30seconds?) would be an upper limit I would thing. > If you used 1 sector for the bitmap, that is 4096 bits so on a terabyte > array, you have 256Meg chunks that resync in a few seconds. > > Certainly an interesting idea to experiment with I think. > > NeilBrown > Neil, Is there likely to be any discussion of how trim / unmap will be invoked by the filesystem layer at OLS? Or how do those decisions get made? ie. The ext4 list was recently talking about sending down very small grained info, but large grained seems to make a lot more sense to me. Hopefully, each filesystem is not given the ability to decide for themselves. Very much seems like something the lk community should have input into, not just the ext4 maintainer. Greg -- Greg Freemyer Head of EDD Tape Extraction and Processing team Litigation Triage Solutions Specialist http://www.linkedin.com/in/gregfreemyer First 99 Days Litigation White Paper - http://www.norcrossgroup.com/forms/whitepapers/99%20Days%20whitepaper.pdf The Norcross Group The Intersection of Evidence & Technology http://www.norcrossgroup.com ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: RAID5 reconstruction ? 2009-05-30 18:55 ` Goswin von Brederlow 2009-05-30 19:37 ` Redeeman @ 2009-06-02 18:42 ` Bill Davidsen 1 sibling, 0 replies; 22+ messages in thread From: Bill Davidsen @ 2009-06-02 18:42 UTC (permalink / raw) To: Goswin von Brederlow; +Cc: Redeeman, John Robinson, SandeepKsinha, Linux RAID Goswin von Brederlow wrote: > Redeeman <redeeman@metanurb.dk> writes: > > >> On Sat, 2009-05-30 at 14:35 +0100, John Robinson wrote: >> >>> On 30/05/2009 06:44, SandeepKsinha wrote: >>> >>>> Hi all, >>>> >>>> Say If I have a RAID 5 array of 50GB of five disks of 10GB each. >>>> >>>> I have data of 5GB. When a disk fails and replaced with a spare disk. >>>> Will the reconstruction happen only for the 5GB allocated disk blocks >>>> or it will happen for the whole disk size. >>>> >>> The whole disc size, for now anyway; md does not currently note which >>> blocks have been used by its client (the filesystem, LVM, whatever). >>> >>> >>>> Is it possible to make reconstruction intelligent enough to keep it optimized ? >>>> >>> This has been discussed in combination with supporting SSD drives' TRIM >>> function, and would mean md had to keep track of used chunks or possibly >>> even sectors using a bitmap or something like that, but whether anyone's >>> working on it I don't know. >>> >> I would say it should be possible to 'query' the filesystem for that >> information. Obviously this will only work if you run a filesystem on it >> which supports it, but it would seem like a nicer solution than a bitmap >> for it. >> >> >>> Cheers, >>> >>> John. >>> > > And just when I hit send I thought of something else. > > Instead of the initial sync when creating a raid the bitmap could just > mark all blocks as unused. Much faster raid creation. > That sounds a lot like what I mentioned, therefore it must be right. See the thread on sync on a new array, my reply to Neil. -- Bill Davidsen <davidsen@tmr.com> Even purely technical things can appear to be magic, if the documentation is obscure enough. For example, PulseAudio is configured by dancing naked around a fire at midnight, shaking a rattle with one hand and a LISP manual with the other, while reciting the GNU manifesto in hexadecimal. The documentation fails to note that you must circle the fire counter-clockwise in the southern hemisphere. ^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2009-06-09 4:13 UTC | newest] Thread overview: 22+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-05-30 5:44 RAID5 reconstruction ? SandeepKsinha 2009-05-30 12:52 ` Sujit Karataparambil 2009-05-30 13:28 ` SandeepKsinha 2009-05-30 13:31 ` Sujit Karataparambil 2009-06-09 4:13 ` Nifty Fedora Mitch 2009-05-30 13:35 ` John Robinson 2009-05-30 14:06 ` Maxime Boissonneault 2009-05-30 15:46 ` John Robinson 2009-05-30 16:16 ` Maxime Boissonneault 2009-05-30 16:30 ` John Robinson 2009-05-30 16:08 ` Redeeman 2009-05-30 18:39 ` Bill Davidsen 2009-05-30 18:54 ` Goswin von Brederlow 2009-05-31 8:10 ` SandeepKsinha 2009-05-30 18:55 ` Goswin von Brederlow 2009-05-30 19:37 ` Redeeman 2009-05-31 8:02 ` SandeepKsinha 2009-05-31 11:54 ` Goswin von Brederlow 2009-05-31 12:11 ` John Robinson 2009-05-31 12:14 ` NeilBrown 2009-06-03 1:54 ` Greg Freemyer 2009-06-02 18:42 ` Bill Davidsen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).