* RE: Bitmap did not survive reboot [not found] <4AF9ABAA.1020407@redhat.com> @ 2009-11-11 3:34 ` Leslie Rhorer 2009-11-11 3:46 ` Leslie Rhorer ` (2 more replies) 0 siblings, 3 replies; 30+ messages in thread From: Leslie Rhorer @ 2009-11-11 3:34 UTC (permalink / raw) To: linux-raid > >> Notice that bring up all raids normally happens before mount > >> filesystems. So, with your bitmaps on a partition that likely isn't > >> mounted when the raids are brought up, how would it ever work? > > > > It's possible it never has, since as I say I don't ordinarily reboot > > these systems. How can I work around this? As you can see from my > original > > post, the boot and root file systems are reiserfs, and we are cautioned > not > > to put the bitmap on anything other than an ext2 or ext3 file system, > which > > is why I created the small /dev/hd4 partition. > > Sure, that makes sense, but given that usually filesystems are on raid > devices and not the other way around, it makes sense that the raids are > brought up first ;-) I see your point. It's certainly a simpler approach. > > I suppose I could put a script in /etc/init.d et al in order to grow > > the array with a bitmap every time the system boots, but that's clumsy, > and > > I could imagine it could create a problem if the array ever comes up > > unclean. > > > > I guess I could also drop to nunlevel 0, wipe the root partition, > > recreate it as an ext3, and copy all the files back, but I don't relish > the > > idea. Is there any way to convert a reiserfs partition to an ext3 on > the > > fly? > > > > Oh, and just BTW, Debian does not employ rc.sysinit. > > I don't grok debian unfortunately :-( However, short of essentially > going in an hand editing the startup files to either A) mount the bitmap > mount early or B) both start and mount md0 late, you won't get it to > work. Well, it's supposed to be pretty simple, but I just ran across something very odd. Instead of using an rc.sysinit file, Debian maintains a directory in /etc for each runlevel named rcN.d, where N is the runlevel, plus one named rcS.d and a file named rc.local. The rc.local is run after exiting any multi-user runlevel, and normally does nothing but quit with an exit code of 0. Generally, the files in the rcN.d and rcS.d directories are all just symlinks to scripts in /etc/init.d. The convention is the link names are of the form Sxxyyyy or Kxxyyyy, where xx is a number between 01 and 99 and yyyy is just some mnemonic text. Any link with a leading "K" is taken to be disabled and is thus ignored by the system The scripts in rcS.d are executed during a system boot, before entering any runlevel, including a single user runlevel. In addition to running everything in rcS.d at boot time, whenever entering runlevel N, all the files in rcN.d are executed. Each file is executed in order by its name. Thus, all the S01 - S10 scripts are run before S20, etc. By the time any S40xxx script runs in rcS.d, all the local file systems should be mounted, networking should be available, and all device drivers should be initialized. By the time any S60xxx script is run, the system clock should be set, any NFS file systems should be mounted (unless they depend upon the automounter), and any file system cleaning should be done. The first RAID script in rcS.d is S25mdadm-raid and the first call to the mount script is S35mountall.sh. Thus, as you say, the RAID systems are loaded before the system attempts to mount anything other than /. The default runlevel in Debian is 2, so during ordinary booting, everything in rcS.d should run followed by everything in rc2.d. Here's what's weird, and it can't really be correct... I don't think. In both rcS.d and rc2.d (and no doubt others), there are two scripts: RAID-Server:/etc# ll rcS.d/*md* lrwxrwxrwx 1 root root 20 2008-11-21 22:35 rcS.d/S25mdadm-raid -> ../init.d/mdadm-raid lrwxrwxrwx 1 root root 20 2008-12-27 18:35 rcS.d/S99mdadm_monitor -> ../init.d/mdadm-raid RAID-Server:/etc# ll rc2.d/*md* lrwxrwxrwx 1 root root 15 2008-11-21 22:35 rc2.d/S25mdadm -> ../init.d/mdadm lrwxrwxrwx 1 root root 20 2008-12-27 18:36 rc2.d/S99mdadm_monitor -> ../init.d/mdadm-raid Note both S99mdadm_monitor links point to /etc/init.d/mdadm-raid, and so does the S25mdadm-raid script in rc2.d, while the /etc/rc2.d/S25mdadm script points to /etc/init.d/mdadm. The mdadm-raid script starts up the RAID process, and the mdadm script runs the monitor. It seems to me the only link which is really correct is the rcS.d/S25mdadm. At the very least I would think both the S99mdadm_monitor links should point to init.d/mdadm (which, after all is the script which starts the monitor) and that rc2.d/S25mdadm-raid would point to init.d/mdadm, just as the rcS.d/S25mdadm-raid link does. Of course, since the RAID startup script does get called before any of the others, and since the script only shuts down RAID for runlevel 0 (halt) or runlevel 6 (reboot) and not for runlevel 1 - 5 or S, it still works OK, but I don't think it's really correct. Can someone else comment? Getting back to my dilema, however, I suppose I could simply create an /etc/rcS.d/S24mounthda4 script that explicitly mounts /dev/hda4 to /etc/mdadm/bitmap, or I could modify the init.d/mdadm-raid script to explicitly mount the /dev/hda partition if it is not already mounted. Editing the init.d/mdadm-raid script is a bit cleaner and perhaps clearer, but any update to mdadm is liable to wipe out the modifications to the startup script. > If the array isn't super performance critical, I would use mdadm > to delete the bitmap, then grow an internal bitmap with a nice high > chunk size and just go from there. It can't be worse than what you've > got going on now. I really dislike that option. Doing it manually every time I boot would be a pain. Writing a script to do it automatically is no more trouble (or really much different) than writing a script to mount the partition explicitly prior to running mdadm, but it avoids any issues of which I am unaware (but can imagine) with, say, trying to grow a bitmap on an array that is other than clean. I'd rather have mdadm take care of such details. What do you (and the other memebers of the list) think? ^ permalink raw reply [flat|nested] 30+ messages in thread
* RE: Bitmap did not survive reboot 2009-11-11 3:34 ` Bitmap did not survive reboot Leslie Rhorer @ 2009-11-11 3:46 ` Leslie Rhorer 2009-11-11 5:22 ` Majed B. 2009-11-11 15:19 ` Gabor Gombas 2009-11-11 20:32 ` Doug Ledford 2 siblings, 1 reply; 30+ messages in thread From: Leslie Rhorer @ 2009-11-11 3:46 UTC (permalink / raw) To: linux-raid > The mdadm-raid script starts up the > RAID process, and the mdadm script runs the monitor. It seems to me the > only link which is really correct is the rcS.d/S25mdadm. At the very > least > I would think both the S99mdadm_monitor links should point to init.d/mdadm > (which, after all is the script which starts the monitor) and that > rc2.d/S25mdadm-raid would point to init.d/mdadm, just as the > rcS.d/S25mdadm-raid link does. 'Sorry, I meant to say, "rc2.d/S25mdadm-raid would point to init.d/mdadm-raid, just as the rcS.d/S25mdadm-raid link does." The naming convention here is definitely confusing. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Bitmap did not survive reboot 2009-11-11 3:46 ` Leslie Rhorer @ 2009-11-11 5:22 ` Majed B. 2009-11-11 8:13 ` Leslie Rhorer 2009-11-11 9:16 ` Robin Hill 0 siblings, 2 replies; 30+ messages in thread From: Majed B. @ 2009-11-11 5:22 UTC (permalink / raw) To: Leslie Rhorer; +Cc: linux-raid Hello Leslie, If you have a temporary space for your data, I'd suggest you move it out and go for an internal bitmap solution. It certainly beats the patch work you're going to have to do on the startup scripts (and every time you update mdadm, or the distro). On Wed, Nov 11, 2009 at 6:46 AM, Leslie Rhorer <lrhorer@satx.rr.com> wrote: >> The mdadm-raid script starts up the >> RAID process, and the mdadm script runs the monitor. It seems to me the >> only link which is really correct is the rcS.d/S25mdadm. At the very >> least >> I would think both the S99mdadm_monitor links should point to init.d/mdadm >> (which, after all is the script which starts the monitor) and that >> rc2.d/S25mdadm-raid would point to init.d/mdadm, just as the >> rcS.d/S25mdadm-raid link does. > > 'Sorry, I meant to say, "rc2.d/S25mdadm-raid would point to > init.d/mdadm-raid, just as the rcS.d/S25mdadm-raid link does." The naming > convention here is definitely confusing. > > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- Majed B. -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 30+ messages in thread
* RE: Bitmap did not survive reboot 2009-11-11 5:22 ` Majed B. @ 2009-11-11 8:13 ` Leslie Rhorer 2009-11-11 8:19 ` Michael Evans 2009-11-11 9:31 ` Majed B. 2009-11-11 9:16 ` Robin Hill 1 sibling, 2 replies; 30+ messages in thread From: Leslie Rhorer @ 2009-11-11 8:13 UTC (permalink / raw) To: linux-raid > If you have a temporary space for your data, I'd suggest you move it > out and go for an internal bitmap solution. It certainly beats the For 8 Terabytes of data? No, I don't. I'm also not really keen on interrupting the system ( in whatever fashion ) for six to eight days while I copy the data out and back or taking the primary copy offline while I re-do the array just so I can implement an internal bitmap. It's much easier to handle the external situation one way or the other. > patch work you're going to have to do on the startup scripts (and > every time you update mdadm, or the distro). That's why I am leaning strongly toward the lower value script, which in fact I have already done. Of course, it's also trivial to disable it. Updating mdadm or the distro won't affect the mount script. At most I would only have to rename the link, and then only if the mdadm startup link gets re-numbered, which is unlilkely. Creating the following script and one symlink to it are hardly "patchwork" in any significant sense: #! /bin/sh # Explicitly mount /dev/hda4 prior to running mdadm so the write-intent # bitmap will be available to mdadm echo Mounting RAID bitmap... mount -t ext2 /dev/hda4 /etc/mdadm/bitmap ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Bitmap did not survive reboot 2009-11-11 8:13 ` Leslie Rhorer @ 2009-11-11 8:19 ` Michael Evans 2009-11-11 8:53 ` Leslie Rhorer 2009-11-11 9:31 ` Majed B. 1 sibling, 1 reply; 30+ messages in thread From: Michael Evans @ 2009-11-11 8:19 UTC (permalink / raw) To: Leslie Rhorer; +Cc: linux-raid That kind of exception is one of the main areas I think major dists fail. Making it so absolutely difficult to insert administratively know requirements at 'odd' points in the boot order. When I last used Debian it was easy with that S## / K## linking system. Arch is another dist that has a way of doing that, except it's based in a core config file. I like Debian's method more because then you can use shell scripts to easily slice/dice/add things at given points. Arch is more than sufficient for normal tasks though. ^ permalink raw reply [flat|nested] 30+ messages in thread
* RE: Bitmap did not survive reboot 2009-11-11 8:19 ` Michael Evans @ 2009-11-11 8:53 ` Leslie Rhorer 2009-11-11 9:31 ` John Robinson 0 siblings, 1 reply; 30+ messages in thread From: Leslie Rhorer @ 2009-11-11 8:53 UTC (permalink / raw) To: linux-raid > That kind of exception is one of the main areas I think major dists > fail. Making it so absolutely difficult to insert administratively > know requirements at 'odd' points in the boot order. When I last used > Debian it was easy with that S## / K## linking system. I agree. It may not be the fastest or the most "sexy" method, but it is solid, simple, and easy to manage, the oddness I discovered this evening notwithstanding. (The backup server doesn't have the issue, only the main video server did. I simply erased the "duplicate" links, so now RAID is started at boot and the monitor is started on entry to runlevel 2 - 5.) > Arch is > another dist that has a way of doing that, except it's based in a core > config file. I like Debian's method more because then you can use > shell scripts to easily slice/dice/add things at given points. Arch > is more than sufficient for normal tasks though. The main reason I like Debian is it stays well away from the bleeding edge. Most distros have a stable and an unstable version, and some have a testing version. Debian has experimental, testing, unstable, and stable, and its testing version is more like what most distros call their stable version. I do really like the approach Debian takes to its booting. It's really easy to troubleshoot a booting issue. Making changes to the boot sequence often doesn't even require editing any files. One can simply rename a link to a higher or lower number to move it about in the boot sequence, or rename it from Sxxyyy to Kxxyyy to disable it. It took me less than 3 minutes total to implement the explicit mount routine on both servers, and unless someone can give me either a much better solution or a solid reason I should not take this approach, I think I'm going to stay with it until such time as I either re-format the root partition or else re-format the array on either respective system. I don't expect the former on either system any time soon. I expect to do the latter on one of the arrays some time in the next three or four months, and the other within a year or so, at which point I may choose the internal bitmap. I don't know, though. I think I rather prefer the external bitmap. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Bitmap did not survive reboot 2009-11-11 8:53 ` Leslie Rhorer @ 2009-11-11 9:31 ` John Robinson 2009-11-11 14:52 ` Leslie Rhorer 0 siblings, 1 reply; 30+ messages in thread From: John Robinson @ 2009-11-11 9:31 UTC (permalink / raw) To: Leslie Rhorer; +Cc: linux-raid On 11/11/2009 08:53, Leslie Rhorer wrote: [...] > It took me less than 3 minutes total to implement the explicit mount > routine on both servers, and unless someone can give me either a much better > solution or a solid reason I should not take this approach The only problem I can see is that your bitmap is not on a RAID device so if the disc goes you lose the bitmap. I guess you're accepting the risk of downtime anyway because your filesystem root is on a non-RAID device, but while you can (as I think you've said before) replace the boot disc quickly, you're exposing yourself to a long, slow resync which will increase your downtime by perhaps days... Cheers, John. ^ permalink raw reply [flat|nested] 30+ messages in thread
* RE: Bitmap did not survive reboot 2009-11-11 9:31 ` John Robinson @ 2009-11-11 14:52 ` Leslie Rhorer 2009-11-11 16:02 ` John Robinson 0 siblings, 1 reply; 30+ messages in thread From: Leslie Rhorer @ 2009-11-11 14:52 UTC (permalink / raw) To: 'John Robinson'; +Cc: linux-raid > On 11/11/2009 08:53, Leslie Rhorer wrote: > [...] > > It took me less than 3 minutes total to implement the explicit mount > > routine on both servers, and unless someone can give me either a much > better > > solution or a solid reason I should not take this approach > > The only problem I can see is that your bitmap is not on a RAID device > so if the disc goes you lose the bitmap. I guess you're accepting the True, but since the OS is also on the drive, if it goes I will need to shut down the system anyway. > risk of downtime anyway because your filesystem root is on a non-RAID > device, but while you can (as I think you've said before) replace the > boot disc quickly, you're exposing yourself to a long, slow resync which > will increase your downtime by perhaps days... That presumes a loss of the boot drive *AND* one or two RAID drives. That's pretty unlikely, unless something really nasty happens. The boot drives are not on the same controller or in the same enclosure. They aren't even the same type of drive. The boot drives are PATA drives inside the same enclosure as the respective motherboards. The RAID drives are SATA drives in external RAID enclosures. Right now, a non-bitmap resync takes about a day and a half, if I limit array access. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Bitmap did not survive reboot 2009-11-11 14:52 ` Leslie Rhorer @ 2009-11-11 16:02 ` John Robinson 0 siblings, 0 replies; 30+ messages in thread From: John Robinson @ 2009-11-11 16:02 UTC (permalink / raw) To: Leslie Rhorer; +Cc: linux-raid On 11/11/2009 14:52, Leslie Rhorer wrote: >> On 11/11/2009 08:53, Leslie Rhorer wrote: [...] >> risk of downtime anyway because your filesystem root is on a non-RAID >> device, but while you can (as I think you've said before) replace the >> boot disc quickly, you're exposing yourself to a long, slow resync which >> will increase your downtime by perhaps days... > > That presumes a loss of the boot drive *AND* one or two RAID drives. > That's pretty unlikely, unless something really nasty happens. I don't think it does - even if none of the RAID drives goes, the system crashing because of the root drive going AWOL might well leave the system with the RAID array marked "dirty" because it wasn't shut down cleanly, which without the bitmap would mean you'd get a full resync when you rebooted. Cheers, John. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Bitmap did not survive reboot 2009-11-11 8:13 ` Leslie Rhorer 2009-11-11 8:19 ` Michael Evans @ 2009-11-11 9:31 ` Majed B. 2009-11-11 14:54 ` Leslie Rhorer 1 sibling, 1 reply; 30+ messages in thread From: Majed B. @ 2009-11-11 9:31 UTC (permalink / raw) To: Leslie Rhorer; +Cc: linux-raid > mount -t ext2 /dev/hda4 /etc/mdadm/bitmap I would suggest you mount the partition using UUID, just in case one day the disk decided to change its name, like what happened to me a while back. On Wed, Nov 11, 2009 at 11:13 AM, Leslie Rhorer <lrhorer@satx.rr.com> wrote: >> If you have a temporary space for your data, I'd suggest you move it >> out and go for an internal bitmap solution. It certainly beats the > > For 8 Terabytes of data? No, I don't. I'm also not really keen on > interrupting the system ( in whatever fashion ) for six to eight days while > I copy the data out and back or taking the primary copy offline while I > re-do the array just so I can implement an internal bitmap. It's much > easier to handle the external situation one way or the other. > >> patch work you're going to have to do on the startup scripts (and >> every time you update mdadm, or the distro). > > That's why I am leaning strongly toward the lower value script, > which in fact I have already done. Of course, it's also trivial to disable > it. Updating mdadm or the distro won't affect the mount script. At most I > would only have to rename the link, and then only if the mdadm startup link > gets re-numbered, which is unlilkely. Creating the following script and one > symlink to it are hardly "patchwork" in any significant sense: > > #! /bin/sh > # Explicitly mount /dev/hda4 prior to running mdadm so the write-intent > # bitmap will be available to mdadm > > echo Mounting RAID bitmap... > mount -t ext2 /dev/hda4 /etc/mdadm/bitmap > > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- Majed B. -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 30+ messages in thread
* RE: Bitmap did not survive reboot 2009-11-11 9:31 ` Majed B. @ 2009-11-11 14:54 ` Leslie Rhorer 0 siblings, 0 replies; 30+ messages in thread From: Leslie Rhorer @ 2009-11-11 14:54 UTC (permalink / raw) To: 'Majed B.'; +Cc: linux-raid > > mount -t ext2 /dev/hda4 /etc/mdadm/bitmap > > I would suggest you mount the partition using UUID, just in case one > day the disk decided to change its name, like what happened to me a > while back. Yeah, I've had that happen with SATA drives quite a bit. That's not a bad idea. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Bitmap did not survive reboot 2009-11-11 5:22 ` Majed B. 2009-11-11 8:13 ` Leslie Rhorer @ 2009-11-11 9:16 ` Robin Hill 2009-11-11 15:01 ` Leslie Rhorer 1 sibling, 1 reply; 30+ messages in thread From: Robin Hill @ 2009-11-11 9:16 UTC (permalink / raw) To: linux-raid [-- Attachment #1: Type: text/plain, Size: 775 bytes --] On Wed Nov 11, 2009 at 08:22:26AM +0300, Majed B. wrote: > Hello Leslie, > > If you have a temporary space for your data, I'd suggest you move it > out and go for an internal bitmap solution. It certainly beats the > patch work you're going to have to do on the startup scripts (and > every time you update mdadm, or the distro). > There should be no need to move the data off - you can add an internal bitmap using the --grow option. An internal bitmap does have more of an overhead than an external one though. Cheers, Robin -- ___ ( ' } | Robin Hill <robin@robinhill.me.uk> | / / ) | Little Jim says .... | // !! | "He fallen in de water !!" | [-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 30+ messages in thread
* RE: Bitmap did not survive reboot 2009-11-11 9:16 ` Robin Hill @ 2009-11-11 15:01 ` Leslie Rhorer 2009-11-11 15:53 ` Robin Hill 2009-11-11 20:35 ` Doug Ledford 0 siblings, 2 replies; 30+ messages in thread From: Leslie Rhorer @ 2009-11-11 15:01 UTC (permalink / raw) To: linux-raid > > If you have a temporary space for your data, I'd suggest you move it > > out and go for an internal bitmap solution. It certainly beats the > > patch work you're going to have to do on the startup scripts (and > > every time you update mdadm, or the distro). > > > There should be no need to move the data off - you can add an internal > bitmap using the --grow option. An internal bitmap does have more of an > overhead than an external one though. I thought I remembered reading in the man page than an internal bitmap could only be added when the array was created? Is that incorrect? ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Bitmap did not survive reboot 2009-11-11 15:01 ` Leslie Rhorer @ 2009-11-11 15:53 ` Robin Hill 2009-11-11 20:35 ` Doug Ledford 1 sibling, 0 replies; 30+ messages in thread From: Robin Hill @ 2009-11-11 15:53 UTC (permalink / raw) To: linux-raid [-- Attachment #1: Type: text/plain, Size: 1230 bytes --] On Wed Nov 11, 2009 at 09:01:46AM -0600, Leslie Rhorer wrote: > > > > If you have a temporary space for your data, I'd suggest you move it > > > out and go for an internal bitmap solution. It certainly beats the > > > patch work you're going to have to do on the startup scripts (and > > > every time you update mdadm, or the distro). > > > > > There should be no need to move the data off - you can add an internal > > bitmap using the --grow option. An internal bitmap does have more of an > > overhead than an external one though. > > I thought I remembered reading in the man page than an internal > bitmap could only be added when the array was created? Is that incorrect? > I've certainly done this with mdadm 2.6.8 - I guess older versions may not be able to though. It's only able to use a limited amount of space though (whatever's left between the metadata and the data/end of disk), so you don't get as much (if any) control of the chunk size. Cheers, Robin -- ___ ( ' } | Robin Hill <robin@robinhill.me.uk> | / / ) | Little Jim says .... | // !! | "He fallen in de water !!" | [-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Bitmap did not survive reboot 2009-11-11 15:01 ` Leslie Rhorer 2009-11-11 15:53 ` Robin Hill @ 2009-11-11 20:35 ` Doug Ledford 2009-11-11 21:46 ` Ben DJ 2009-11-12 0:23 ` Leslie Rhorer 1 sibling, 2 replies; 30+ messages in thread From: Doug Ledford @ 2009-11-11 20:35 UTC (permalink / raw) To: Leslie Rhorer, Linux RAID Mailing List [-- Attachment #1: Type: text/plain, Size: 1294 bytes --] On 11/11/2009 10:01 AM, Leslie Rhorer wrote: > >>> If you have a temporary space for your data, I'd suggest you move it >>> out and go for an internal bitmap solution. It certainly beats the >>> patch work you're going to have to do on the startup scripts (and >>> every time you update mdadm, or the distro). >>> >> There should be no need to move the data off - you can add an internal >> bitmap using the --grow option. An internal bitmap does have more of an >> overhead than an external one though. > > I thought I remembered reading in the man page than an internal > bitmap could only be added when the array was created? Is that incorrect? Yes, very incorrect. You can use grow to add an internal bitmap later, the only limitation is that the bitmap must be small enough to fit in the reserved space around the superblock. It's in the case that you want to create some super huge, absolutely insanely fine grained bitmap that it must be done at raid device creation time and that's only so it can reserve sufficient space for the bitmap. -- Doug Ledford <dledford@redhat.com> GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 197 bytes --] ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Bitmap did not survive reboot 2009-11-11 20:35 ` Doug Ledford @ 2009-11-11 21:46 ` Ben DJ 2009-11-11 22:10 ` Robin Hill 2009-11-12 1:35 ` Doug Ledford 2009-11-12 0:23 ` Leslie Rhorer 1 sibling, 2 replies; 30+ messages in thread From: Ben DJ @ 2009-11-11 21:46 UTC (permalink / raw) To: Doug Ledford; +Cc: Leslie Rhorer, Linux RAID Mailing List Hi, On Wed, Nov 11, 2009 at 12:35 PM, Doug Ledford <dledford@redhat.com> wrote: > Yes, very incorrect. You can use grow to add an internal bitmap later, Is that true for RAID-10, as well? I understood "--grow" with RAID-10 wasn't fully capable -- yet. BenDJ -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Bitmap did not survive reboot 2009-11-11 21:46 ` Ben DJ @ 2009-11-11 22:10 ` Robin Hill 2009-11-12 1:35 ` Doug Ledford 1 sibling, 0 replies; 30+ messages in thread From: Robin Hill @ 2009-11-11 22:10 UTC (permalink / raw) To: Linux RAID Mailing List [-- Attachment #1: Type: text/plain, Size: 699 bytes --] On Wed Nov 11, 2009 at 01:46:50PM -0800, Ben DJ wrote: > Hi, > > On Wed, Nov 11, 2009 at 12:35 PM, Doug Ledford <dledford@redhat.com> wrote: > > Yes, very incorrect. You can use grow to add an internal bitmap later, > > Is that true for RAID-10, as well? I understood "--grow" with RAID-10 > wasn't fully capable -- yet. > It's true for RAID-10, yes. You can't physically grow the array, but you can definitely add/remove the bitmap. Cheers, Robin -- ___ ( ' } | Robin Hill <robin@robinhill.me.uk> | / / ) | Little Jim says .... | // !! | "He fallen in de water !!" | [-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Bitmap did not survive reboot 2009-11-11 21:46 ` Ben DJ 2009-11-11 22:10 ` Robin Hill @ 2009-11-12 1:35 ` Doug Ledford 1 sibling, 0 replies; 30+ messages in thread From: Doug Ledford @ 2009-11-12 1:35 UTC (permalink / raw) To: Ben DJ; +Cc: Leslie Rhorer, Linux RAID Mailing List [-- Attachment #1: Type: text/plain, Size: 715 bytes --] On 11/11/2009 04:46 PM, Ben DJ wrote: > Hi, > > On Wed, Nov 11, 2009 at 12:35 PM, Doug Ledford <dledford@redhat.com> wrote: >> Yes, very incorrect. You can use grow to add an internal bitmap later, > > Is that true for RAID-10, as well? I understood "--grow" with RAID-10 > wasn't fully capable -- yet. I don't know. I never heard anything about raid-10 and grow not being compatible. I'd just set up a couple fakes devices using loopback, create a raid-10, and then try it ;-) -- Doug Ledford <dledford@redhat.com> GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 197 bytes --] ^ permalink raw reply [flat|nested] 30+ messages in thread
* RE: Bitmap did not survive reboot 2009-11-11 20:35 ` Doug Ledford 2009-11-11 21:46 ` Ben DJ @ 2009-11-12 0:23 ` Leslie Rhorer 2009-11-12 1:34 ` Doug Ledford 1 sibling, 1 reply; 30+ messages in thread From: Leslie Rhorer @ 2009-11-12 0:23 UTC (permalink / raw) To: 'Doug Ledford', 'Linux RAID Mailing List' > > would be a pain. Writing a script to do it automatically is no more > trouble > > (or really much different) than writing a script to mount the partition > > explicitly prior to running mdadm, but it avoids any issues of which I > am > > unaware (but can imagine) with, say, trying to grow a bitmap on an array > > that is other than clean. I'd rather have mdadm take care of such > details. > I think you are overestimating the difficulty of this solution. It's as > simple as: > > mdadm -G /dev/md0 --bitmap=none > mdadm -G /dev/md0 --bitmap=internal --bitmap-chunk=32768 (or even higher) No, I was referring to a script which grew an external bitmap on a mounted file system after mdadm had already done its magic. What I was mis-remembering was: > >>> If you have a temporary space for your data, I'd suggest you move it > >>> out and go for an internal bitmap solution. It certainly beats the > >>> patch work you're going to have to do on the startup scripts (and > >>> every time you update mdadm, or the distro). > >>> > >> There should be no need to move the data off - you can add an internal > >> bitmap using the --grow option. An internal bitmap does have more of > an > >> overhead than an external one though. > > > > I thought I remembered reading in the man page than an internal > > bitmap could only be added when the array was created? Is that > incorrect? > > Yes, very incorrect. You can use grow to add an internal bitmap later, I guess I skimmed over the manual rather quickly back then, and I was dealing with serious RAID issues at the time, so I must have improperly inferred the man page to imply this in the section which says, "Note that if you add a bitmap stored in a file which is in a filesystem that is on the raid array being affected, the system will deadlock. The bitmap must be on a separate filesystem" to read something more like, "Note that if you add a bitmap ... the bitmap must be on a separate filesystem. > the only limitation is that the bitmap must be small enough to fit in > the reserved space around the superblock. It's in the case that you > want to create some super huge, absolutely insanely fine grained bitmap > that it must be done at raid device creation time and that's only so it > can reserve sufficient space for the bitmap. How can I know how much space is available? I tried adding the internal bitmap without specifying anything, and it seems to have worked fine. When I created the bitmap in an external file (without specifying the size), it was around 100K, which seems rather small. Both of these systems use un-partitioned disks with XFS mounted directly on the RAID array. One is a 7 drive RAID5 array on 1.5 TB disks and the other is a 10 drive RAID6 array on 1.0TB disks. Both are using a version 1.2 superblock. The only thing which jumps out at me is --examine, but it doesn't seem to tell me much: RAID-Server:/usr/share/pyTivo# mdadm --examine /dev/sda /dev/sda: Magic : a92b4efc Version : 1.2 Feature Map : 0x1 Array UUID : 5ff10d73:a096195f:7a646bba:a68986ca Name : RAID-Server:0 (local to host RAID-Server) Creation Time : Sat Apr 25 01:17:12 2009 Raid Level : raid6 Raid Devices : 10 Avail Dev Size : 1953524896 (931.51 GiB 1000.20 GB) Array Size : 15628197888 (7452.11 GiB 8001.64 GB) Used Dev Size : 1953524736 (931.51 GiB 1000.20 GB) Data Offset : 272 sectors Super Offset : 8 sectors State : clean Device UUID : d40c9255:cef0739f:966d448d:e549ada8 Internal Bitmap : 2 sectors from superblock Update Time : Wed Nov 11 18:17:26 2009 Checksum : 9a4cc480 - correct Events : 488380 Chunk Size : 256K Array Slot : 0 (0, 1, 2, 3, 4, 5, 6, 7, 8, 9) Array State : Uuuuuuuuuu Backup:/etc/gadmin-rsync# mdadm --examine /dev/sda /dev/sda: Magic : a92b4efc Version : 1.2 Feature Map : 0x1 Array UUID : 940ae4e4:04057ffc:5e92d2fb:63e3efb7 Name : 'Backup':0 Creation Time : Sun Jul 12 20:44:02 2009 Raid Level : raid5 Raid Devices : 7 Avail Dev Size : 2930276896 (1397.26 GiB 1500.30 GB) Array Size : 17581661184 (8383.59 GiB 9001.81 GB) Used Dev Size : 2930276864 (1397.26 GiB 1500.30 GB) Data Offset : 272 sectors Super Offset : 8 sectors State : clean Device UUID : 6156794f:00807e1b:306ed20d:b81914de Internal Bitmap : 2 sectors from superblock Update Time : Wed Nov 11 11:52:43 2009 Checksum : 12afc60a - correct Events : 10100 Layout : left-symmetric Chunk Size : 256K Array Slot : 0 (0, 1, 2, 3, 4, 5, 6) Array State : Uuuuuuu ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Bitmap did not survive reboot 2009-11-12 0:23 ` Leslie Rhorer @ 2009-11-12 1:34 ` Doug Ledford 2009-11-12 4:55 ` Leslie Rhorer 0 siblings, 1 reply; 30+ messages in thread From: Doug Ledford @ 2009-11-12 1:34 UTC (permalink / raw) To: Leslie Rhorer; +Cc: 'Linux RAID Mailing List' [-- Attachment #1: Type: text/plain, Size: 4506 bytes --] On 11/11/2009 07:23 PM, Leslie Rhorer wrote: > I guess I skimmed over the manual rather quickly back then, and I > was dealing with serious RAID issues at the time, so I must have improperly > inferred the man page to imply this in the section which says, "Note that > if you add a bitmap stored in a file which is in a filesystem that is on the > raid array being affected, the system will deadlock. The bitmap must be on > a separate filesystem" to read something more like, "Note that if you add a > bitmap ... the bitmap must be on a separate filesystem. Understandable, and now corrected, so no biggie ;-) >> the only limitation is that the bitmap must be small enough to fit in >> the reserved space around the superblock. It's in the case that you >> want to create some super huge, absolutely insanely fine grained bitmap >> that it must be done at raid device creation time and that's only so it >> can reserve sufficient space for the bitmap. > > How can I know how much space is available? I tried adding the > internal bitmap without specifying anything, and it seems to have worked > fine. When I created the bitmap in an external file (without specifying the > size), it was around 100K, which seems rather small. 100k is a huge bitmap. For my 2.5TB array, and a bitmap chunk size of 32768KB, I get the entire in-memory bitmap in 24k (as I recall, the in-memory bitmap is larger than the on-disk bitmap as the on-disk bitmap only stores a dirty/clean bit per chunk where as the in-memory bitmap also includes a counter per chunk so it knows when all outstanding writes complete and it needs to transition to clean, but I could be mis-remembering that). > Both of these systems > use un-partitioned disks with XFS mounted directly on the RAID array. One > is a 7 drive RAID5 array on 1.5 TB disks and the other is a 10 drive RAID6 > array on 1.0TB disks. Both are using a version 1.2 superblock. The only > thing which jumps out at me is --examine, but it doesn't seem to tell me > much: > > RAID-Server:/usr/share/pyTivo# mdadm --examine /dev/sda > /dev/sda: > Magic : a92b4efc > Version : 1.2 > Feature Map : 0x1 > Array UUID : 5ff10d73:a096195f:7a646bba:a68986ca > Name : RAID-Server:0 (local to host RAID-Server) > Creation Time : Sat Apr 25 01:17:12 2009 > Raid Level : raid6 > Raid Devices : 10 > > Avail Dev Size : 1953524896 (931.51 GiB 1000.20 GB) > Array Size : 15628197888 (7452.11 GiB 8001.64 GB) > Used Dev Size : 1953524736 (931.51 GiB 1000.20 GB) > Data Offset : 272 sectors > Super Offset : 8 sectors The above two items are what you need for both version 1.1 and 1.2 superblocks in order to figure things out. The data, aka the filesystem itself, starts at the Data Offset which is 272 sectors. The superblock itself is 8 sectors in from the front of the disk because you have version 1.2 superblocks. So, 272 - 8 - size of the superblock, which is only a sector or two, is how much internal space you have. So, in your case, you have about 132k of space for the bitmap. Version 1.0 superblocks are a little different in that you need to know the actual size of the device and you need the super offset and possibly the used dev size. There will be free space between the end of the data and the superblock (super offset - used dev size) and free space after the superblock (actual dev size as given by fdisk (either the size of the device itself on whole disk devices or the size of the partition you are using) - super offset - size of superblock). I don't know which is used by the bitmap, but I seem to recall the bitmap wants to be between the superblock and the end of the data, so I think the used dev size and super offset are the important numbers there. You mentioned that you used the defaults when creating the bitmap. That's likely to hurt your performance. The default bitmap chunk is too small. I would redo it with a larger bitmap chunk. If you look in /proc/mdstat, it should tell you the current bitmap chunk. Given that you stream large sequential files, you could go with an insanely large bitmap chunk and be fine. Something like 65536 or 131072 should be good. -- Doug Ledford <dledford@redhat.com> GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 197 bytes --] ^ permalink raw reply [flat|nested] 30+ messages in thread
* RE: Bitmap did not survive reboot 2009-11-12 1:34 ` Doug Ledford @ 2009-11-12 4:55 ` Leslie Rhorer 2009-11-12 5:22 ` Doug Ledford 0 siblings, 1 reply; 30+ messages in thread From: Leslie Rhorer @ 2009-11-12 4:55 UTC (permalink / raw) To: 'Doug Ledford'; +Cc: 'Linux RAID Mailing List' > > Data Offset : 272 sectors > > Super Offset : 8 sectors > > The above two items are what you need for both version 1.1 and 1.2 > superblocks in order to figure things out. The data, aka the filesystem > itself, starts at the Data Offset which is 272 sectors. The superblock > itself is 8 sectors in from the front of the disk because you have > version 1.2 superblocks. So, 272 - 8 - size of the superblock, which is > only a sector or two, is how much internal space you have. So, in your > case, you have about 132k of space for the bitmap. OK. The 10 drive system shows: bitmap: 0/466 pages [0KB], 1024KB chunk The 7 drive system shows: bitmap: 0/350 pages [0KB], 2048KB chunk So you think I should remove both and replace them with mdadm -G /dev/md0 --bitmap=internal --bitmap-chunk=65536 ? While most of the files are large video files, there are a fair number which are smaller data files such as those of the IMAP server and Quicken. I don't want performance to be too terrible for them, either. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Bitmap did not survive reboot 2009-11-12 4:55 ` Leslie Rhorer @ 2009-11-12 5:22 ` Doug Ledford 2009-11-14 21:48 ` Leslie Rhorer 0 siblings, 1 reply; 30+ messages in thread From: Doug Ledford @ 2009-11-12 5:22 UTC (permalink / raw) To: Leslie Rhorer; +Cc: 'Linux RAID Mailing List' [-- Attachment #1: Type: text/plain, Size: 2554 bytes --] On 11/11/2009 11:55 PM, Leslie Rhorer wrote: >>> Data Offset : 272 sectors >>> Super Offset : 8 sectors >> >> The above two items are what you need for both version 1.1 and 1.2 >> superblocks in order to figure things out. The data, aka the filesystem >> itself, starts at the Data Offset which is 272 sectors. The superblock >> itself is 8 sectors in from the front of the disk because you have >> version 1.2 superblocks. So, 272 - 8 - size of the superblock, which is >> only a sector or two, is how much internal space you have. So, in your >> case, you have about 132k of space for the bitmap. > > OK. The 10 drive system shows: > > bitmap: 0/466 pages [0KB], 1024KB chunk > > The 7 drive system shows: > > bitmap: 0/350 pages [0KB], 2048KB chunk > > So you think I should remove both and replace them with > > mdadm -G /dev/md0 --bitmap=internal --bitmap-chunk=65536 > > ? > > While most of the files are large video files, there are a fair > number which are smaller data files such as those of the IMAP server and > Quicken. I don't want performance to be too terrible for them, either. Oh yeah, those chunk sizes are waaayyyy too small. Definitely replace them. If it will make you feel better, you can do some performance testing before and after to see why I say so ;-) I would recommend running these tests to check the performance change for yourself: dbench -t 300 -D $mpoint --clients-per-process=4 16 | tail -19 >> $log_file mkdir $mpoint/bonnie chown nobody.nobody $mpoint/bonnie bonnie++ -u nobody:nobody -d $mpoint/bonnie -f -m RAID${lvl}-${num}Disk-${chunk}k -n 64:65536:1024:16 >>$log_file 2>/dev/null tiotest -f 1024 -t 6 -r 1000 -d $mpoint -b 4096 >> $log_file tiotest -f 1024 -t 6 -r 1000 -d $mpoint -b 16384 >> $log_file Obviously, I pulled these tests out of a script I use where all these various variables are defined. Just replace the variables with something sensible for accessing your array, run them, save off the results, run again with a different chunk size, then please post the results back here as I imagine they would be very informative. Especially the dbench results as I think they are likely to benefit the most from the change. Note: dbench, bonnie++, and tiotest should all be available in the debian repos. -- Doug Ledford <dledford@redhat.com> GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 197 bytes --] ^ permalink raw reply [flat|nested] 30+ messages in thread
* RE: Bitmap did not survive reboot 2009-11-12 5:22 ` Doug Ledford @ 2009-11-14 21:48 ` Leslie Rhorer 2009-11-15 11:01 ` Doug Ledford 0 siblings, 1 reply; 30+ messages in thread From: Leslie Rhorer @ 2009-11-14 21:48 UTC (permalink / raw) To: 'Doug Ledford'; +Cc: 'Linux RAID Mailing List' > dbench -t 300 -D $mpoint --clients-per-process=4 16 | tail -19 >> > $log_file > mkdir $mpoint/bonnie > chown nobody.nobody $mpoint/bonnie > bonnie++ -u nobody:nobody -d $mpoint/bonnie -f -m > RAID${lvl}-${num}Disk-${chunk}k -n 64:65536:1024:16 >>$log_file > 2>/dev/null > tiotest -f 1024 -t 6 -r 1000 -d $mpoint -b 4096 >> $log_file > tiotest -f 1024 -t 6 -r 1000 -d $mpoint -b 16384 >> $log_file > > Obviously, I pulled these tests out of a script I use where all these > various variables are defined. Just replace the variables with > something sensible for accessing your array, run them, save off the > results, run again with a different chunk size, then please post the > results back here as I imagine they would be very informative. > Especially the dbench results as I think they are likely to benefit the > most from the change. Note: dbench, bonnie++, and tiotest should all be > available in the debian repos. I could not find tiotest. Also, the version dbench in the distro does not support the --clients-per-process switch. I'll post the results from the backup system here, and from the primary system in the next post. Backup with bitmap-chunk 65M: 16 363285 109.22 MB/sec execute 286 sec 16 364067 109.05 MB/sec execute 287 sec 16 365960 109.26 MB/sec execute 288 sec 16 366880 109.13 MB/sec execute 289 sec 16 368850 109.35 MB/sec execute 290 sec 16 370444 109.45 MB/sec execute 291 sec 16 372360 109.64 MB/sec execute 292 sec 16 373973 109.74 MB/sec execute 293 sec 16 374821 109.61 MB/sec execute 294 sec 16 376967 109.88 MB/sec execute 295 sec 16 377813 109.77 MB/sec execute 296 sec 16 379422 109.87 MB/sec execute 297 sec 16 381197 110.05 MB/sec execute 298 sec 16 382029 109.92 MB/sec execute 299 sec 16 383868 110.10 MB/sec cleanup 300 sec 16 383868 109.74 MB/sec cleanup 301 sec 16 383868 109.37 MB/sec cleanup 302 sec Throughput 110.101 MB/sec 16 procs Version 1.03d ------Sequential Output------ --Sequential Input- --Random- -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP RAID5-7Disk-6 3520M 66779 23 46588 22 127821 29 334.7 2 ------Sequential Create------ --------Random Create-------- -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files:max:min /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 64:65536:1024/16 35 1 113 2 404 9 36 1 54 1 193 5 Backup with default (2048K) bitmap-chunk 16 306080 91.68 MB/sec execute 286 sec 16 307214 91.69 MB/sec execute 287 sec 16 307922 91.61 MB/sec execute 288 sec 16 308828 91.53 MB/sec execute 289 sec 16 310653 91.78 MB/sec execute 290 sec 16 311926 91.82 MB/sec execute 291 sec 16 313569 92.01 MB/sec execute 292 sec 16 314478 91.96 MB/sec execute 293 sec 16 315578 91.99 MB/sec execute 294 sec 16 317416 92.18 MB/sec execute 295 sec 16 318576 92.25 MB/sec execute 296 sec 16 320391 92.39 MB/sec execute 297 sec 16 321309 92.40 MB/sec execute 298 sec 16 322461 92.42 MB/sec execute 299 sec 16 324486 92.70 MB/sec cleanup 300 sec 16 324486 92.39 MB/sec cleanup 301 sec 16 324486 92.17 MB/sec cleanup 302 sec Throughput 92.6969 MB/sec 16 procs Version 1.03d ------Sequential Output------ --Sequential Input- --Random- -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP RAID5-7Disk-2 3520M 38751 14 31738 15 114481 28 279.0 1 ------Sequential Create------ --------Random Create-------- -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files:max:min /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 64:65536:1024/16 30 1 104 2 340 8 30 1 64 1 160 4 ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Bitmap did not survive reboot 2009-11-14 21:48 ` Leslie Rhorer @ 2009-11-15 11:01 ` Doug Ledford 2009-11-15 19:27 ` Leslie Rhorer 0 siblings, 1 reply; 30+ messages in thread From: Doug Ledford @ 2009-11-15 11:01 UTC (permalink / raw) To: Leslie Rhorer; +Cc: 'Linux RAID Mailing List' [-- Attachment #1: Type: text/plain, Size: 6277 bytes --] On 11/14/2009 04:48 PM, Leslie Rhorer wrote: >> dbench -t 300 -D $mpoint --clients-per-process=4 16 | tail -19 >> >> $log_file >> mkdir $mpoint/bonnie >> chown nobody.nobody $mpoint/bonnie >> bonnie++ -u nobody:nobody -d $mpoint/bonnie -f -m >> RAID${lvl}-${num}Disk-${chunk}k -n 64:65536:1024:16 >>$log_file >> 2>/dev/null >> tiotest -f 1024 -t 6 -r 1000 -d $mpoint -b 4096 >> $log_file >> tiotest -f 1024 -t 6 -r 1000 -d $mpoint -b 16384 >> $log_file >> >> Obviously, I pulled these tests out of a script I use where all these >> various variables are defined. Just replace the variables with >> something sensible for accessing your array, run them, save off the >> results, run again with a different chunk size, then please post the >> results back here as I imagine they would be very informative. >> Especially the dbench results as I think they are likely to benefit the >> most from the change. Note: dbench, bonnie++, and tiotest should all be >> available in the debian repos. > > I could not find tiotest. Also, the version dbench in the distro > does not support the --clients-per-process switch. I'll post the results > from the backup system here, and from the primary system in the next post. > > Backup with bitmap-chunk 65M: > > 16 363285 109.22 MB/sec execute 286 sec > 16 364067 109.05 MB/sec execute 287 sec > 16 365960 109.26 MB/sec execute 288 sec > 16 366880 109.13 MB/sec execute 289 sec > 16 368850 109.35 MB/sec execute 290 sec > 16 370444 109.45 MB/sec execute 291 sec > 16 372360 109.64 MB/sec execute 292 sec > 16 373973 109.74 MB/sec execute 293 sec > 16 374821 109.61 MB/sec execute 294 sec > 16 376967 109.88 MB/sec execute 295 sec > 16 377813 109.77 MB/sec execute 296 sec > 16 379422 109.87 MB/sec execute 297 sec > 16 381197 110.05 MB/sec execute 298 sec > 16 382029 109.92 MB/sec execute 299 sec > 16 383868 110.10 MB/sec cleanup 300 sec > 16 383868 109.74 MB/sec cleanup 301 sec > 16 383868 109.37 MB/sec cleanup 302 sec Hmmm...interesting. This is not the output I expected. This is the second by second update from the app, not the final results. The tail -19 should have grabbed the final results and looked something like this: Operation Count AvgLat MaxLat ---------------------------------------- NTCreateX 3712699 0.186 297.432 Close 2726300 0.013 168.654 Rename 157340 0.149 161.108 Unlink 750442 0.317 274.044 Qpathinfo 3367128 0.054 297.590 Qfileinfo 586968 0.011 148.788 Qfsinfo 617376 0.921 373.536 Sfileinfo 302636 0.028 151.030 Find 1301556 0.121 309.603 WriteX 1834128 0.125 341.075 ReadX 5825192 0.047 239.368 LockX 12088 0.006 24.543 UnlockX 12088 0.006 23.540 Flush 260391 7.149 520.703 Throughput 385.585 MB/sec 64 clients 16 procs max_latency=661.232 ms This allows comparison of not just the final throughput but also the various activities. Regardless though, 109 average versus 92 average is a very telling story. That's an 18% performance difference and amounts to a *HUGE* factor. > Throughput 110.101 MB/sec 16 procs > Version 1.03d ------Sequential Output------ --Sequential Input- > --Random- > -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- > --Seeks-- > Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec > %CP > RAID5-7Disk-6 3520M 66779 23 46588 22 127821 29 334.7 > 2 > ------Sequential Create------ --------Random > Create-------- > -Create-- --Read--- -Delete-- -Create-- --Read--- > -Delete-- > files:max:min /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec > %CP > 64:65536:1024/16 35 1 113 2 404 9 36 1 54 1 193 > 5 Ditto with these bonnie++ numbers, *HUGE* difference. 66MB/s versus 38MB/s, 46MB/s versus 31MB/s, and 127MB/s versus 114MB/s on the sequential stuff. The random numbers are all low enough that I'm not sure I trust them (the random numbers in my test setup are in the thousands, not the hundreds). > > Backup with default (2048K) bitmap-chunk > > > 16 306080 91.68 MB/sec execute 286 sec > 16 307214 91.69 MB/sec execute 287 sec > 16 307922 91.61 MB/sec execute 288 sec > 16 308828 91.53 MB/sec execute 289 sec > 16 310653 91.78 MB/sec execute 290 sec > 16 311926 91.82 MB/sec execute 291 sec > 16 313569 92.01 MB/sec execute 292 sec > 16 314478 91.96 MB/sec execute 293 sec > 16 315578 91.99 MB/sec execute 294 sec > 16 317416 92.18 MB/sec execute 295 sec > 16 318576 92.25 MB/sec execute 296 sec > 16 320391 92.39 MB/sec execute 297 sec > 16 321309 92.40 MB/sec execute 298 sec > 16 322461 92.42 MB/sec execute 299 sec > 16 324486 92.70 MB/sec cleanup 300 sec > 16 324486 92.39 MB/sec cleanup 301 sec > 16 324486 92.17 MB/sec cleanup 302 sec > > Throughput 92.6969 MB/sec 16 procs > Version 1.03d ------Sequential Output------ --Sequential Input- > --Random- > -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- > --Seeks-- > Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec > %CP > RAID5-7Disk-2 3520M 38751 14 31738 15 114481 28 279.0 > 1 > ------Sequential Create------ --------Random > Create-------- > -Create-- --Read--- -Delete-- -Create-- --Read--- > -Delete-- > files:max:min /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec > %CP > 64:65536:1024/16 30 1 104 2 340 8 30 1 64 1 160 > 4 -- Doug Ledford <dledford@redhat.com> GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 197 bytes --] ^ permalink raw reply [flat|nested] 30+ messages in thread
* RE: Bitmap did not survive reboot 2009-11-15 11:01 ` Doug Ledford @ 2009-11-15 19:27 ` Leslie Rhorer 0 siblings, 0 replies; 30+ messages in thread From: Leslie Rhorer @ 2009-11-15 19:27 UTC (permalink / raw) To: 'Doug Ledford'; +Cc: 'Linux RAID Mailing List' > This allows comparison of not just the final throughput but also the > various activities. Regardless though, 109 average versus 92 average is > a very telling story. That's an 18% performance difference and amounts > to a *HUGE* factor. Well, not so much. Remember, there is only 1 link for ingress / egress on these machines - a single Gig-E link. Getting much over 90 MBps would be a challenge. Really the only process running on this machine is an rsync daemon which runs at 04:00 every morning, and I really don't care if the rsync takes an extra 10 minutes or some such. Of course, in the event of having to copy the entire data set to a failed array, any extra performance would be welcome, but I'm really not concerned about it. Now if this were one of my commercial production servers, it would be a different matter, but this is for my house, and it is only a backup unit. That doesn't mean I am going to revert to the smaller bitmap chunk, though. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Bitmap did not survive reboot 2009-11-11 3:34 ` Bitmap did not survive reboot Leslie Rhorer 2009-11-11 3:46 ` Leslie Rhorer @ 2009-11-11 15:19 ` Gabor Gombas 2009-11-11 16:48 ` Leslie Rhorer 2009-11-11 20:32 ` Doug Ledford 2 siblings, 1 reply; 30+ messages in thread From: Gabor Gombas @ 2009-11-11 15:19 UTC (permalink / raw) To: Leslie Rhorer; +Cc: linux-raid On Tue, Nov 10, 2009 at 09:34:09PM -0600, Leslie Rhorer wrote: > RAID-Server:/etc# ll rcS.d/*md* > lrwxrwxrwx 1 root root 20 2008-11-21 22:35 rcS.d/S25mdadm-raid -> > ../init.d/mdadm-raid > lrwxrwxrwx 1 root root 20 2008-12-27 18:35 rcS.d/S99mdadm_monitor -> > ../init.d/mdadm-raid > RAID-Server:/etc# ll rc2.d/*md* > lrwxrwxrwx 1 root root 15 2008-11-21 22:35 rc2.d/S25mdadm -> ../init.d/mdadm > lrwxrwxrwx 1 root root 20 2008-12-27 18:36 rc2.d/S99mdadm_monitor -> > ../init.d/mdadm-raid What Debian version do you have? There are no mdadm_monitor links in lenny, and I do not have etch systems anymore to check. > Getting back to my dilema, however, I suppose I could simply create > an /etc/rcS.d/S24mounthda4 script that explicitly mounts /dev/hda4 to > /etc/mdadm/bitmap, or I could modify the init.d/mdadm-raid script to > explicitly mount the /dev/hda partition if it is not already mounted. > Editing the init.d/mdadm-raid script is a bit cleaner and perhaps clearer, > but any update to mdadm is liable to wipe out the modifications to the > startup script. In Debian, modifications to init scripts are preserved during upgrade unless you explicitely request them to be overwritten. Gabor -- --------------------------------------------------------- MTA SZTAKI Computer and Automation Research Institute Hungarian Academy of Sciences --------------------------------------------------------- ^ permalink raw reply [flat|nested] 30+ messages in thread
* RE: Bitmap did not survive reboot 2009-11-11 15:19 ` Gabor Gombas @ 2009-11-11 16:48 ` Leslie Rhorer 0 siblings, 0 replies; 30+ messages in thread From: Leslie Rhorer @ 2009-11-11 16:48 UTC (permalink / raw) To: 'Gabor Gombas'; +Cc: linux-raid > > RAID-Server:/etc# ll rcS.d/*md* > > lrwxrwxrwx 1 root root 20 2008-11-21 22:35 rcS.d/S25mdadm-raid -> > > ../init.d/mdadm-raid > > lrwxrwxrwx 1 root root 20 2008-12-27 18:35 rcS.d/S99mdadm_monitor -> > > ../init.d/mdadm-raid > > RAID-Server:/etc# ll rc2.d/*md* > > lrwxrwxrwx 1 root root 15 2008-11-21 22:35 rc2.d/S25mdadm -> > ../init.d/mdadm > > lrwxrwxrwx 1 root root 20 2008-12-27 18:36 rc2.d/S99mdadm_monitor -> > > ../init.d/mdadm-raid > > What Debian version do you have? There are no mdadm_monitor links in > lenny, and I do not have etch systems anymore to check. Lenny. I'm not sure where they came from. It might even have been me playing around at some point, and I just forgot to delete them. I did a lot of fiddling with mdadm about a year ago. > > Getting back to my dilema, however, I suppose I could simply create > > an /etc/rcS.d/S24mounthda4 script that explicitly mounts /dev/hda4 to > > /etc/mdadm/bitmap, or I could modify the init.d/mdadm-raid script to > > explicitly mount the /dev/hda partition if it is not already mounted. > > Editing the init.d/mdadm-raid script is a bit cleaner and perhaps > clearer, > > but any update to mdadm is liable to wipe out the modifications to the > > startup script. > > In Debian, modifications to init scripts are preserved during upgrade > unless you explicitely request them to be overwritten. I didn't know that, but even so, it's probably at least somewhat better not to modify a script unless it's necessary, because the new package might have some important differences in the script. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Bitmap did not survive reboot 2009-11-11 3:34 ` Bitmap did not survive reboot Leslie Rhorer 2009-11-11 3:46 ` Leslie Rhorer 2009-11-11 15:19 ` Gabor Gombas @ 2009-11-11 20:32 ` Doug Ledford 2 siblings, 0 replies; 30+ messages in thread From: Doug Ledford @ 2009-11-11 20:32 UTC (permalink / raw) To: Leslie Rhorer; +Cc: linux-raid [-- Attachment #1: Type: text/plain, Size: 2544 bytes --] On 11/10/2009 10:34 PM, Leslie Rhorer wrote: >> If the array isn't super performance critical, I would use mdadm >> to delete the bitmap, then grow an internal bitmap with a nice high >> chunk size and just go from there. It can't be worse than what you've >> got going on now. > > I really dislike that option. Doing it manually every time I boot > would be a pain. Writing a script to do it automatically is no more trouble > (or really much different) than writing a script to mount the partition > explicitly prior to running mdadm, but it avoids any issues of which I am > unaware (but can imagine) with, say, trying to grow a bitmap on an array > that is other than clean. I'd rather have mdadm take care of such details. I think you are overestimating the difficulty of this solution. It's as simple as: mdadm -G /dev/md0 --bitmap=none mdadm -G /dev/md0 --bitmap=internal --bitmap-chunk=32768 (or even higher) and you are done. It won't need to resync the entire device as the device is already clean and it won't create a bitmap that's too large for the free space that currently exists between the superblock and the start of your data. You can see the free space available for a bitmap by running mdadm -E on one of the block devices and interpreting the data start/data offset/superblock offset fields (sorry there isn't a simply field to look at, but the math changes depending on what superblock version you use and I can't remember if I've ever known what superblock you happen to have). No need to copy stuff around, no need to take things down, all done in place, and the issue is solved permanently with no need to muck around in your system init scripts as from now on when you boot up the bitmap is internal to the array and will be used from the second the array is assembled. The only reason I mentioned anything about performance is because an internal bitmap does slightly slow down random access to an array (although not so much streaming access), but that slowdown is mitigated by using a nice high bitmap chunk size (and for most people a big bitmap chunk is preferable anyway). As I recall, you are serving video files, so your access pattern is large streaming I/O, and that means the bitmap really shouldn't be noticeable in your performance. -- Doug Ledford <dledford@redhat.com> GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 197 bytes --] ^ permalink raw reply [flat|nested] 30+ messages in thread
* Bitmap did not survive reboot @ 2009-11-10 1:44 Leslie Rhorer 2009-11-10 1:58 ` Leslie Rhorer 0 siblings, 1 reply; 30+ messages in thread From: Leslie Rhorer @ 2009-11-10 1:44 UTC (permalink / raw) To: linux-raid I had to reboot one of my Linux systems a few days ago (November 2) because something was a little unstable, although RAID was AFAIK working just fine. This is not an online production system, so rather than try to run down the culprit, I just rebooted the box. Everything seemed to come back up just fine, so I really didn't spend too much time checking everything out. Today one of the drives in the RAID5 array was kicked out, so I removed it and added it back. It wasn't until I added the drive back that I noticed the array no longer had a write-intent bitmap. The array had an external bitmap, but it is no longer there, and I presume for some reason it was not registered when the box rebooted. I don't see anything which looks like a failure related to md in the logs. The external bitmap is in an ext2 file system in a partition of the boot drive, so the file should be available during boot prior to building the RAID array. What could be causing the bitmap to drop out? This isn't the first time it has happened. I searched /var/log for the string "md" to find the messages related to activity on the array. Is there some other string for which I should search? Here is /etc/fstab : Backup:/var/log# cat /etc/fstab # /etc/fstab: static file system information. # # <file system> <mount point> <type> <options> <dump> <pass> proc /proc proc defaults 0 0 /dev/hda2 / reiserfs defaults 0 1 /dev/hda1 /boot reiserfs notail 0 2 /dev/hda4 /etc/mdadm/bitmap ext2 defaults 0 1 /dev/hda5 none swap sw 0 0 /dev/hdb /media/cdrom0 udf,iso9660user,noauto 0 0 /dev/md0 /Backup xfs defaults 0 2 Here is /etc/mdadm/mdadm.conf Backup:/etc/mdadm# cat mdadm.conf # mdadm.conf # # Please refer to mdadm.conf(5) for information about this file. # # by default, scan all partitions (/proc/partitions) for MD superblocks. # alternatively, specify devices to scan, using wildcards if desired. DEVICE partitions # auto-create devices with Debian standard permissions CREATE owner=root group=disk mode=0660 auto=yes # automatically tag new arrays as belonging to the local system HOMEHOST <system> # instruct the monitoring daemon where to send mail alerts MAILADDR lrhorer@satx.rr.com # definitions of existing MD arrays # This file was auto-generated on Thu, 14 May 2009 20:25:57 -0500 # by mkconf $Id$ PROGRAM /usr/bin/mdadm_notify DEVICE /dev/sd[a-g] ARRAY /dev/md0 level=raid5 metadata=1.2 num-devices=7 UUID=940ae4e4:04057ffc:5e92d2fb:63e3efb7 name='Backup':0 bitmap=/etc/mdadm/bitmap/md0.map ^ permalink raw reply [flat|nested] 30+ messages in thread
* RE: Bitmap did not survive reboot 2009-11-10 1:44 Leslie Rhorer @ 2009-11-10 1:58 ` Leslie Rhorer 0 siblings, 0 replies; 30+ messages in thread From: Leslie Rhorer @ 2009-11-10 1:58 UTC (permalink / raw) To: linux-raid > I had to reboot one of my Linux systems a few days ago (November 2) > because something was a little unstable, although RAID was AFAIK working > just fine. This is not an online production system, so rather than try to > run down the culprit, I just rebooted the box. Everything seemed to come > back up just fine, so I really didn't spend too much time checking > everything out. Today one of the drives in the RAID5 array was kicked > out, > so I removed it and added it back. It wasn't until I added the drive back > that I noticed the array no longer had a write-intent bitmap. The array > had > an external bitmap, but it is no longer there, and I presume for some > reason > it was not registered when the box rebooted. I don't see anything which Oh, hey, I just looked at one of my other Linux systems which was shut down during a protracted power outage 16 days ago, and it, too, is missing its bitmap, presumably since it was brought back up. ^ permalink raw reply [flat|nested] 30+ messages in thread
end of thread, other threads:[~2009-11-15 19:27 UTC | newest]
Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <4AF9ABAA.1020407@redhat.com>
2009-11-11 3:34 ` Bitmap did not survive reboot Leslie Rhorer
2009-11-11 3:46 ` Leslie Rhorer
2009-11-11 5:22 ` Majed B.
2009-11-11 8:13 ` Leslie Rhorer
2009-11-11 8:19 ` Michael Evans
2009-11-11 8:53 ` Leslie Rhorer
2009-11-11 9:31 ` John Robinson
2009-11-11 14:52 ` Leslie Rhorer
2009-11-11 16:02 ` John Robinson
2009-11-11 9:31 ` Majed B.
2009-11-11 14:54 ` Leslie Rhorer
2009-11-11 9:16 ` Robin Hill
2009-11-11 15:01 ` Leslie Rhorer
2009-11-11 15:53 ` Robin Hill
2009-11-11 20:35 ` Doug Ledford
2009-11-11 21:46 ` Ben DJ
2009-11-11 22:10 ` Robin Hill
2009-11-12 1:35 ` Doug Ledford
2009-11-12 0:23 ` Leslie Rhorer
2009-11-12 1:34 ` Doug Ledford
2009-11-12 4:55 ` Leslie Rhorer
2009-11-12 5:22 ` Doug Ledford
2009-11-14 21:48 ` Leslie Rhorer
2009-11-15 11:01 ` Doug Ledford
2009-11-15 19:27 ` Leslie Rhorer
2009-11-11 15:19 ` Gabor Gombas
2009-11-11 16:48 ` Leslie Rhorer
2009-11-11 20:32 ` Doug Ledford
2009-11-10 1:44 Leslie Rhorer
2009-11-10 1:58 ` Leslie Rhorer
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).