* Re: swidth with mdadm and RAID6 [not found] <fc.004c4d192b2da8e03b9aca0078918430.2b2da8e5@umit.maine.edu> @ 2006-09-19 16:36 ` Steve Cousins 2006-09-19 16:58 ` Shailendra Tripathi 2006-09-19 17:13 ` Steve Cousins 0 siblings, 2 replies; 19+ messages in thread From: Steve Cousins @ 2006-09-19 16:36 UTC (permalink / raw) To: Shailendra Tripathi; +Cc: xfs Hi Shailendra, I ran the program and it reports: Level 6, disks=11 spare_disks=1 raid_disks=10 which looks good. I don't understand why you got: Level 5, disks=7 spare_disks=3 raid_disks=5 Why would it have 3 spare_disks? Thanks, Steve ______________________________________________________________________ Steve Cousins, Ocean Modeling Group Email: cousins@umit.maine.edu Marine Sciences, 452 Aubert Hall http://rocky.umeoce.maine.edu Univ. of Maine, Orono, ME 04469 Phone: (207) 581-4302 On Mon, 18 Sep 2006, Shailendra Tripathi wrote: > Hi Steve, > Your guess appears to be correct. md_ioctl returns nr which > is total number of disk in the array including the spare disks. However, > XFS function md_get_vol_stripe does not take spare disk into account. It > needs to subtract spare_disks as well. > However, md.spare_disks returned by the call returns spare + parity > (both). So, one way could be substract spare_disks directly. Otherwise, > the xfs should rely on md.raid_disks. This does not include spare_disks > and nr.disks should be changed for that. > > When I run my program md_info on raid5 array with 5 devices and 2 > spares, I get > [root@ga09 root]# ./a.out /dev/md11 > Level 5, disks=7 spare_disks=3 raid_disks=5 > > Steve can you please compile the pasted program and run on your system > with md prepared. It takes /dev/md<no> as input. > In your case, you should get above line as: > Level 6, disks=11 spare disks=3 raid_disks=10 > > nr=working=active=failed=spare=0; > ITERATE_RDEV(mddev,rdev,tmp) { > nr++; > if (rdev->faulty) > failed++; > else { > working++; > if (rdev->in_sync) > active++; > else > spare++; > } > } > > info.level = mddev->level; > info.size = mddev->size; > info.nr_disks = nr; > .... > info.active_disks = active; > info.working_disks = working; > info.failed_disks = failed; > info.spare_disks = spare; > > -shailendra > The program is pasted below: > md_info.c. Takes /dev/md<no> as name. For example, /dev/md11. > > #include<stdio.h> > #include<fcntl.h> > #include<sys/ioctl.h> > #ifndef MD_MAJOR > #define MD_MAJOR 9 > #endif > > #define GET_ARRAY_INFO _IOR (MD_MAJOR, 0x11, struct md_array_info) > > > struct md_array_info { > __uint32_t major_version; > __uint32_t minor_version; > __uint32_t patch_version; > __uint32_t ctime; > __uint32_t level; > __uint32_t size; > __uint32_t nr_disks; > __uint32_t raid_disks; > __uint32_t md_minor; > __uint32_t not_persistent; > /* > * Generic state information > */ > __uint32_t utime; /* 0 Superblock update time */ > __uint32_t state; /* 1 State bits (clean, ...) */ > __uint32_t active_disks; /* 2 Number of currently active disks */ > __uint32_t working_disks; /* 3 Number of working disks */ > __uint32_t failed_disks; /* 4 Number of failed disks */ > __uint32_t spare_disks; /* 5 Number of spare disks */ > /* > * Personality information > */ > __uint32_t layout; /* 0 the array's physical layout */ > __uint32_t chunk_size; /* 1 chunk size in bytes */ > > }; > > int main(int argc, char *argv[]) > { > struct md_array_info md; > int fd; > > > /* Open device */ > fd = open(argv[1], O_RDONLY); > if (fd == -1) { > printf("Could not open %s\n", argv[1]); > exit(1); > } > if (ioctl(fd, GET_ARRAY_INFO, &md)) { > printf("Error getting MD array info from %s\n", argv[1]); > exit(1); > } > close(fd); > printf("Level %d, disks=%d spare_disks=%d raid_disks=%d\n", > md.level, md.nr_disks, > md.spare_disks, md.raid_disks); > return 0; > } > > > > > > > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: swidth with mdadm and RAID6 2006-09-19 16:36 ` swidth with mdadm and RAID6 Steve Cousins @ 2006-09-19 16:58 ` Shailendra Tripathi 2006-09-19 17:13 ` Steve Cousins 1 sibling, 0 replies; 19+ messages in thread From: Shailendra Tripathi @ 2006-09-19 16:58 UTC (permalink / raw) To: cousins; +Cc: xfs > Hi Shailendra, > > I ran the program and it reports: > > Level 6, disks=11 spare_disks=1 raid_disks=10 > > which looks good. I don't understand why you got: > > Level 5, disks=7 spare_disks=3 raid_disks=5 > > Why would it have 3 spare_disks? Perhaps you are running more recent kernel than mine, and, spare_disks now reports only actual spares. It did appear little weired that it reported spare_disks as 3. get_array_info is changed in recent kernels and that should explain this difference. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: swidth with mdadm and RAID6 2006-09-19 16:36 ` swidth with mdadm and RAID6 Steve Cousins 2006-09-19 16:58 ` Shailendra Tripathi @ 2006-09-19 17:13 ` Steve Cousins 1 sibling, 0 replies; 19+ messages in thread From: Steve Cousins @ 2006-09-19 17:13 UTC (permalink / raw) To: Shailendra Tripathi; +Cc: xfs On Tue, 19 Sep 2006, Steve Cousins wrote: > > Hi Shailendra, > > I ran the program and it reports: > > Level 6, disks=11 spare_disks=1 raid_disks=10 > > which looks good. I don't understand why you got: To me this looks correct but I was re-reading your original message and you said: > > In your case, you should get above line as: > > Level 6, disks=11 spare disks=3 raid_disks=10 I don't understand why we should expect parity disks to be included as spare disks. Steve > Level 5, disks=7 spare_disks=3 raid_disks=5 > > Why would it have 3 spare_disks? > > Thanks, > > Steve > > ______________________________________________________________________ > Steve Cousins, Ocean Modeling Group Email: cousins@umit.maine.edu > Marine Sciences, 452 Aubert Hall http://rocky.umeoce.maine.edu > Univ. of Maine, Orono, ME 04469 Phone: (207) 581-4302 > > On Mon, 18 Sep 2006, Shailendra Tripathi wrote: > > > Hi Steve, > > Your guess appears to be correct. md_ioctl returns nr which > > is total number of disk in the array including the spare disks. However, > > XFS function md_get_vol_stripe does not take spare disk into account. It > > needs to subtract spare_disks as well. > > However, md.spare_disks returned by the call returns spare + parity > > (both). So, one way could be substract spare_disks directly. Otherwise, > > the xfs should rely on md.raid_disks. This does not include spare_disks > > and nr.disks should be changed for that. > > > > When I run my program md_info on raid5 array with 5 devices and 2 > > spares, I get > > [root@ga09 root]# ./a.out /dev/md11 > > Level 5, disks=7 spare_disks=3 raid_disks=5 > > > > Steve can you please compile the pasted program and run on your system > > with md prepared. It takes /dev/md<no> as input. > > In your case, you should get above line as: > > Level 6, disks=11 spare disks=3 raid_disks=10 > > > > nr=working=active=failed=spare=0; > > ITERATE_RDEV(mddev,rdev,tmp) { > > nr++; > > if (rdev->faulty) > > failed++; > > else { > > working++; > > if (rdev->in_sync) > > active++; > > else > > spare++; > > } > > } > > > > info.level = mddev->level; > > info.size = mddev->size; > > info.nr_disks = nr; > > .... > > info.active_disks = active; > > info.working_disks = working; > > info.failed_disks = failed; > > info.spare_disks = spare; > > > > -shailendra > > The program is pasted below: > > md_info.c. Takes /dev/md<no> as name. For example, /dev/md11. > > > > #include<stdio.h> > > #include<fcntl.h> > > #include<sys/ioctl.h> > > #ifndef MD_MAJOR > > #define MD_MAJOR 9 > > #endif > > > > #define GET_ARRAY_INFO _IOR (MD_MAJOR, 0x11, struct md_array_info) > > > > > > struct md_array_info { > > __uint32_t major_version; > > __uint32_t minor_version; > > __uint32_t patch_version; > > __uint32_t ctime; > > __uint32_t level; > > __uint32_t size; > > __uint32_t nr_disks; > > __uint32_t raid_disks; > > __uint32_t md_minor; > > __uint32_t not_persistent; > > /* > > * Generic state information > > */ > > __uint32_t utime; /* 0 Superblock update time */ > > __uint32_t state; /* 1 State bits (clean, ...) */ > > __uint32_t active_disks; /* 2 Number of currently active disks */ > > __uint32_t working_disks; /* 3 Number of working disks */ > > __uint32_t failed_disks; /* 4 Number of failed disks */ > > __uint32_t spare_disks; /* 5 Number of spare disks */ > > /* > > * Personality information > > */ > > __uint32_t layout; /* 0 the array's physical layout */ > > __uint32_t chunk_size; /* 1 chunk size in bytes */ > > > > }; > > > > int main(int argc, char *argv[]) > > { > > struct md_array_info md; > > int fd; > > > > > > /* Open device */ > > fd = open(argv[1], O_RDONLY); > > if (fd == -1) { > > printf("Could not open %s\n", argv[1]); > > exit(1); > > } > > if (ioctl(fd, GET_ARRAY_INFO, &md)) { > > printf("Error getting MD array info from %s\n", argv[1]); > > exit(1); > > } > > close(fd); > > printf("Level %d, disks=%d spare_disks=%d raid_disks=%d\n", > > md.level, md.nr_disks, > > md.spare_disks, md.raid_disks); > > return 0; > > } > > > > > > > > > > > > > > > > ^ permalink raw reply [flat|nested] 19+ messages in thread
[parent not found: <fc.004c4d192b3470d73b9aca0029fcf469.2b349301@umit.maine.edu>]
* Re: swidth with mdadm and RAID6 [not found] <fc.004c4d192b3470d73b9aca0029fcf469.2b349301@umit.maine.edu> @ 2006-09-19 17:52 ` Steve Cousins 2006-09-19 19:22 ` Steve Cousins 0 siblings, 1 reply; 19+ messages in thread From: Steve Cousins @ 2006-09-19 17:52 UTC (permalink / raw) To: Shailendra Tripathi; +Cc: xfs@oss.sgi.com On Tue, 19 Sep 2006, Shailendra Tripathi wrote: > >> Hi Shailendra, > >> > >> I ran the program and it reports: > >> > >> Level 6, disks=11 spare_disks=1 raid_disks=10 > >> > >> which looks good. I don't understand why you got: > >> > >> Level 5, disks=7 spare_disks=3 raid_disks=5 > >> > >> Why would it have 3 spare_disks? > > > Perhaps you are running more recent kernel than mine, and, spare_disks > now reports only actual spares. It did appear little weired that it > reported spare_disks as 3. get_array_info is changed in recent kernels > and that should explain this difference. This is a 2.6.17 kernel. So, with this in mind, is there a change that I should try in libdisk/md.c? Tim had suggested: s/nr_disks/raid_disks/ Would this be sufficient? Or should nr_disks be initialized as raid_disks and then go into the switch clause? Steve ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: swidth with mdadm and RAID6 2006-09-19 17:52 ` Steve Cousins @ 2006-09-19 19:22 ` Steve Cousins 2006-09-19 20:19 ` Shailendra Tripathi 0 siblings, 1 reply; 19+ messages in thread From: Steve Cousins @ 2006-09-19 19:22 UTC (permalink / raw) To: Shailendra Tripathi; +Cc: xfs@oss.sgi.com On Tue, 19 Sep 2006, Steve Cousins wrote: > This is a 2.6.17 kernel. So, with this in mind, is there a change that I > should try in libdisk/md.c? Tim had suggested: > > s/nr_disks/raid_disks/ > > Would this be sufficient? Or should nr_disks be initialized as raid_disks > and then go into the switch clause? I ended up just adding: md.nr_disks = md.raid_disks; right be fore the switch statement and it worked fine in my situation. Not sure how this would work with other kernels etc. but I'll let you figure that out. Thanks very much for your help. Steve ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: swidth with mdadm and RAID6 2006-09-19 19:22 ` Steve Cousins @ 2006-09-19 20:19 ` Shailendra Tripathi 0 siblings, 0 replies; 19+ messages in thread From: Shailendra Tripathi @ 2006-09-19 20:19 UTC (permalink / raw) To: cousins; +Cc: xfs@oss.sgi.com Steve Cousins wrote: >>a 2.6.17 kernel. So, with this in mind, is there a change that I >>should try in libdisk/md.c? Tim had suggested: >> >> s/nr_disks/raid_disks/ >> >>Would this be sufficient? Or should nr_disks be initialized as raid_disks >>and then go into the switch clause? >> >> > >I ended up just adding: > > md.nr_disks = md.raid_disks; > >right be fore the switch statement and it worked fine in my situation. >Not sure how this would work with other kernels etc. but I'll let you >figure that out. > >Thanks very much for your help. > >Steve > > > Hi Steve, Technically speaking, you are doing the same thing. However, just write the function below to avoid any confusion. int md_get_subvol_stripe( char *dfile, sv_type_t type, int *sunit, int *swidth, int *sectalign, struct stat64 *sb) { if (mnt_is_md_subvol(sb->st_rdev)) { struct md_array_info md; int fd; /* Open device */ fd = open(dfile, O_RDONLY); if (fd == -1) return 0; /* Is this thing on... */ if (ioctl(fd, GET_ARRAY_INFO, &md)) { fprintf(stderr, _("Error getting MD array info from %s\n"), dfile); exit(1); } close(fd); /* * Ignore levels we don't want aligned (e.g. linear) * and deduct disk(s) from stripe width on RAID4/5/6 */ switch (md.level) { case 6: md.raid_disks--; /* fallthrough */ case 5: case 4: md.raid_disks--; /* fallthrough */ case 1: case 0: case 10: break; default: return 0; } /* Update sizes */ *sunit = md.chunk_size >> 9; *swidth = *sunit * md.raid_disks; *sectalign = (md.level == 4 || md.level == 5 || md.level == 6); return 1; } return 0; } ^ permalink raw reply [flat|nested] 19+ messages in thread
[parent not found: <fc.004c4d192b2c45a93b9aca00fc3f0f38.2b2c4b4d@umit.maine.edu>]
* Re: swidth with mdadm and RAID6 [not found] <fc.004c4d192b2c45a93b9aca00fc3f0f38.2b2c4b4d@umit.maine.edu> @ 2006-09-18 20:28 ` Steve Cousins 2006-09-18 20:44 ` Steve Cousins 1 sibling, 0 replies; 19+ messages in thread From: Steve Cousins @ 2006-09-18 20:28 UTC (permalink / raw) To: Shailendra Tripathi; +Cc: "xfs@oss.sgi.com" <xfs@oss.sgi.com> Thanks very much Shailendra. I'll give it a try. Steve ______________________________________________________________________ Steve Cousins, Ocean Modeling Group Email: cousins@umit.maine.edu Marine Sciences, 452 Aubert Hall http://rocky.umeoce.maine.edu Univ. of Maine, Orono, ME 04469 Phone: (207) 581-4302 On Mon, 18 Sep 2006, Shailendra Tripathi wrote: > Hi Steve, > Both of us are using old xfsprogs. It is handled in new > xfsprogs. > > */ > switch (md.level) { > case 6: > md.nr_disks--; > /* fallthrough */ > case 5: > case 4: > md.nr_disks--; > /* fallthrough */ > case 1: > case 0: > case 10: > break; > default: > return 0; > > > Regards, > > Shailendra Tripathi wrote: > > >> Hi Steve, > >> I checked the code and it appears that XFS is not *aware* > >> of RAID6. Basically, for all md devices, it gets the volume info by > >> making a an ioctl call. I can see that XFS only take care of level 4 > >> and level 5. It does not account for level 6. > >> Only extra line need to be added here as below: > >> > >> if (md.level == 6) > >> md.nr_disks -= 2; /* RAID 6 has 2 parity disks */ > >> You can try with this change if you can. Do let mew know if it solves > >> your problem. > >> > >> This code is in function: md_get_subvol_stripe in <xf_progs>/libdisk/md.c > >> > >> > >> /* Deduct a disk from stripe width on RAID4/5 */ > >> if (md.level == 4 || md.level == 5) > >> md.nr_disks--; > >> > >> /* Update sizes */ > >> *sunit = md.chunk_size >> 9; > >> *swidth = *sunit * md.nr_disks; > >> > >> return 1; > >> } > >> > >> Regards, > >> Shailendra > >> Steve Cousins wrote: > >> > >>> Hi Shailendra, > >>> > >>> Here is the info: > >>> > >>> 1. [root@juno ~]# cat /proc/mdstat Personalities : [raid6] md0 : > >>> active raid6 sdb[0] sdl[10](S) sdk[9] sdj[8] sdi[7] sdh[6] sdg[5] > >>> sdf[4] sde[3] sdd[2] sdc[1] > >>> 3907091968 blocks level 6, 64k chunk, algorithm 2 [10/10] > >>> [UUUUUUUUUU] > >>> unused devices: <none> > >>> > >>> 2. mdadm --create /dev/md0 --chunk=64 --level=6 --raid-devices=10 > >>> --spare-devices=1 /dev/sd[bcdefghijkl] > >>> > >>> 3. [root@juno ~]# xfs_db -r /dev/md* > >>> xfs_db> sb > >>> xfs_db> p > >>> magicnum = 0x58465342 > >>> blocksize = 4096 > >>> dblocks = 976772992 > >>> rblocks = 0 > >>> rextents = 0 > >>> uuid = 04b32cce-ed38-496f-811f-2ccd51450bf4 > >>> logstart = 536870919 > >>> rootino = 256 > >>> rbmino = 257 > >>> rsumino = 258 > >>> rextsize = 144 > >>> agblocks = 30524160 > >>> agcount = 32 > >>> rbmblocks = 0 > >>> logblocks = 32768 > >>> versionnum = 0x3d84 > >>> sectsize = 4096 > >>> inodesize = 256 > >>> inopblock = 16 > >>> fname = "\000\000\000\000\000\000\000\000\000\000\000\000" > >>> blocklog = 12 > >>> sectlog = 12 > >>> inodelog = 8 > >>> inopblog = 4 > >>> agblklog = 25 > >>> rextslog = 0 > >>> inprogress = 0 > >>> imax_pct = 25 > >>> icount = 36864 > >>> ifree = 362 > >>> fdblocks = 669630878 > >>> frextents = 0 > >>> uquotino = 0 > >>> gquotino = 0 > >>> qflags = 0 > >>> flags = 0 > >>> shared_vn = 0 > >>> inoalignmt = 2 > >>> unit = 16 > >>> width = 144 > >>> dirblklog = 0 > >>> logsectlog = 12 > >>> logsectsize = 4096 > >>> logsunit = 4096 > >>> features2 = 0 > >>> xfs_db> > >>> > >>> Thanks for the help. > >>> > >>> Steve > >>> > >>> ______________________________________________________________________ > >>> Steve Cousins, Ocean Modeling Group Email: cousins@umit.maine.edu > >>> Marine Sciences, 452 Aubert Hall http://rocky.umeoce.maine.edu > >>> Univ. of Maine, Orono, ME 04469 Phone: (207) 581-4302 > >>> > >>> On Mon, 18 Sep 2006, Shailendra Tripathi wrote: > >>> > >>> > >>> > >>>> Can you list the output of > >>>> 1. cat /proc/mdstat > >>>> 2. the command to create 8+2 RAID6 with one spare ? > >>>> 3. and output of following: > >>>> xfs_db -r /dev/md* > >>>> xfs_db> sb > >>>> xfs_db> p > >>>> > >>>> -shailendra > >>>> > >>>> Steve Cousins wrote: > >>>> > >>>> > >>>>>> I have a RAID6 array of 11 500 GB drives using mdadm. There is one > >>>>>> hot-spare so the number of data drives is 8. I used mkfs.xfs with > >>>>>> defaults to create the file system and it seemed to pick up the > >>>>>> chunk size > >>>>>> I used correctly (64K) but I think it got the swidth wrong. Here > >>>>>> is what > >>>>>> xfs_info says: > >>>>>> > >>>>>> =========================================================================== > >>>>>> > >>>>>> meta-data=/dev/md0 isize=256 agcount=32, > >>>>>> agsize=30524160 > >>>>>> blks > >>>>>> = sectsz=4096 attr=0 > >>>>>> data = bsize=4096 blocks=976772992, > >>>>>> imaxpct=25 > >>>>>> = sunit=16 swidth=144 blks, > >>>>>> unwritten=1 > >>>>>> naming =version 2 bsize=4096 > >>>>>> log =internal bsize=4096 blocks=32768, version=2 > >>>>>> = sectsz=4096 sunit=1 blks > >>>>>> realtime =none extsz=589824 blocks=0, rtextents=0 > >>>>>> =========================================================================== > >>>>>> > >>>>>> > >>>>>> So, sunit*bsize=64K, but swidth=144 and swidth/sunit=9 so it looks > >>>>>> like it > >>>>>> thought there were 9 data drives instead of 8. > >>>>>> Am I diagnosing this correctly? Should I recreate the array and > >>>>>> explicitly set sunit=16 and swidth=128? > >>>>>> > >>>>>> Thanks for your help. > >>>>>> > >>>>>> Steve > >>>>>> ______________________________________________________________________ > >>>>>> > >>>>>> Steve Cousins, Ocean Modeling Group Email: cousins@umit.maine.edu > >>>>>> Marine Sciences, 452 Aubert Hall http://rocky.umeoce.maine.edu > >>>>>> Univ. of Maine, Orono, ME 04469 Phone: (207) 581-4302 > >>>>>> > >>>>>> > >>>>>> > >>>>> > >>> > >>> > >>> > >> > >> > > > > > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: swidth with mdadm and RAID6 [not found] <fc.004c4d192b2c45a93b9aca00fc3f0f38.2b2c4b4d@umit.maine.edu> 2006-09-18 20:28 ` Steve Cousins @ 2006-09-18 20:44 ` Steve Cousins 2006-09-18 21:06 ` Shailendra Tripathi 2006-09-18 22:13 ` Shailendra Tripathi 1 sibling, 2 replies; 19+ messages in thread From: Steve Cousins @ 2006-09-18 20:44 UTC (permalink / raw) To: Shailendra Tripathi; +Cc: "xfs@oss.sgi.com" <xfs@oss.sgi.com> Hi again, Still no luck with 2.8.11: [root@juno xfsprogs-2.8.11]# cd mkfs [root@juno mkfs]# ./mkfs.xfs -f /dev/md0 meta-data=/dev/md0 isize=256 agcount=32, agsize=30524160 blks = sectsz=4096 attr=0 data = bsize=4096 blocks=976772992, imaxpct=25 = sunit=16 swidth=144 blks, unwritten=1 naming =version 2 bsize=4096 log =internal log bsize=4096 blocks=32768, version=2 = sectsz=4096 sunit=1 blks realtime =none extsz=589824 blocks=0, rtextents=0 Since I have a spare in there do you think it is starting with md.nr_disks = 11 and then subtracting two? Thanks, Steve ______________________________________________________________________ Steve Cousins, Ocean Modeling Group Email: cousins@umit.maine.edu Marine Sciences, 452 Aubert Hall http://rocky.umeoce.maine.edu Univ. of Maine, Orono, ME 04469 Phone: (207) 581-4302 On Mon, 18 Sep 2006, Shailendra Tripathi wrote: > Hi Steve, > Both of us are using old xfsprogs. It is handled in new > xfsprogs. > > */ > switch (md.level) { > case 6: > md.nr_disks--; > /* fallthrough */ > case 5: > case 4: > md.nr_disks--; > /* fallthrough */ > case 1: > case 0: > case 10: > break; > default: > return 0; > > > Regards, > > Shailendra Tripathi wrote: > > >> Hi Steve, > >> I checked the code and it appears that XFS is not *aware* > >> of RAID6. Basically, for all md devices, it gets the volume info by > >> making a an ioctl call. I can see that XFS only take care of level 4 > >> and level 5. It does not account for level 6. > >> Only extra line need to be added here as below: > >> > >> if (md.level == 6) > >> md.nr_disks -= 2; /* RAID 6 has 2 parity disks */ > >> You can try with this change if you can. Do let mew know if it solves > >> your problem. > >> > >> This code is in function: md_get_subvol_stripe in <xf_progs>/libdisk/md.c > >> > >> > >> /* Deduct a disk from stripe width on RAID4/5 */ > >> if (md.level == 4 || md.level == 5) > >> md.nr_disks--; > >> > >> /* Update sizes */ > >> *sunit = md.chunk_size >> 9; > >> *swidth = *sunit * md.nr_disks; > >> > >> return 1; > >> } > >> > >> Regards, > >> Shailendra > >> Steve Cousins wrote: > >> > >>> Hi Shailendra, > >>> > >>> Here is the info: > >>> > >>> 1. [root@juno ~]# cat /proc/mdstat Personalities : [raid6] md0 : > >>> active raid6 sdb[0] sdl[10](S) sdk[9] sdj[8] sdi[7] sdh[6] sdg[5] > >>> sdf[4] sde[3] sdd[2] sdc[1] > >>> 3907091968 blocks level 6, 64k chunk, algorithm 2 [10/10] > >>> [UUUUUUUUUU] > >>> unused devices: <none> > >>> > >>> 2. mdadm --create /dev/md0 --chunk=64 --level=6 --raid-devices=10 > >>> --spare-devices=1 /dev/sd[bcdefghijkl] > >>> > >>> 3. [root@juno ~]# xfs_db -r /dev/md* > >>> xfs_db> sb > >>> xfs_db> p > >>> magicnum = 0x58465342 > >>> blocksize = 4096 > >>> dblocks = 976772992 > >>> rblocks = 0 > >>> rextents = 0 > >>> uuid = 04b32cce-ed38-496f-811f-2ccd51450bf4 > >>> logstart = 536870919 > >>> rootino = 256 > >>> rbmino = 257 > >>> rsumino = 258 > >>> rextsize = 144 > >>> agblocks = 30524160 > >>> agcount = 32 > >>> rbmblocks = 0 > >>> logblocks = 32768 > >>> versionnum = 0x3d84 > >>> sectsize = 4096 > >>> inodesize = 256 > >>> inopblock = 16 > >>> fname = "\000\000\000\000\000\000\000\000\000\000\000\000" > >>> blocklog = 12 > >>> sectlog = 12 > >>> inodelog = 8 > >>> inopblog = 4 > >>> agblklog = 25 > >>> rextslog = 0 > >>> inprogress = 0 > >>> imax_pct = 25 > >>> icount = 36864 > >>> ifree = 362 > >>> fdblocks = 669630878 > >>> frextents = 0 > >>> uquotino = 0 > >>> gquotino = 0 > >>> qflags = 0 > >>> flags = 0 > >>> shared_vn = 0 > >>> inoalignmt = 2 > >>> unit = 16 > >>> width = 144 > >>> dirblklog = 0 > >>> logsectlog = 12 > >>> logsectsize = 4096 > >>> logsunit = 4096 > >>> features2 = 0 > >>> xfs_db> > >>> > >>> Thanks for the help. > >>> > >>> Steve > >>> > >>> ______________________________________________________________________ > >>> Steve Cousins, Ocean Modeling Group Email: cousins@umit.maine.edu > >>> Marine Sciences, 452 Aubert Hall http://rocky.umeoce.maine.edu > >>> Univ. of Maine, Orono, ME 04469 Phone: (207) 581-4302 > >>> > >>> On Mon, 18 Sep 2006, Shailendra Tripathi wrote: > >>> > >>> > >>> > >>>> Can you list the output of > >>>> 1. cat /proc/mdstat > >>>> 2. the command to create 8+2 RAID6 with one spare ? > >>>> 3. and output of following: > >>>> xfs_db -r /dev/md* > >>>> xfs_db> sb > >>>> xfs_db> p > >>>> > >>>> -shailendra > >>>> > >>>> Steve Cousins wrote: > >>>> > >>>> > >>>>>> I have a RAID6 array of 11 500 GB drives using mdadm. There is one > >>>>>> hot-spare so the number of data drives is 8. I used mkfs.xfs with > >>>>>> defaults to create the file system and it seemed to pick up the > >>>>>> chunk size > >>>>>> I used correctly (64K) but I think it got the swidth wrong. Here > >>>>>> is what > >>>>>> xfs_info says: > >>>>>> > >>>>>> =========================================================================== > >>>>>> > >>>>>> meta-data=/dev/md0 isize=256 agcount=32, > >>>>>> agsize=30524160 > >>>>>> blks > >>>>>> = sectsz=4096 attr=0 > >>>>>> data = bsize=4096 blocks=976772992, > >>>>>> imaxpct=25 > >>>>>> = sunit=16 swidth=144 blks, > >>>>>> unwritten=1 > >>>>>> naming =version 2 bsize=4096 > >>>>>> log =internal bsize=4096 blocks=32768, version=2 > >>>>>> = sectsz=4096 sunit=1 blks > >>>>>> realtime =none extsz=589824 blocks=0, rtextents=0 > >>>>>> =========================================================================== > >>>>>> > >>>>>> > >>>>>> So, sunit*bsize=64K, but swidth=144 and swidth/sunit=9 so it looks > >>>>>> like it > >>>>>> thought there were 9 data drives instead of 8. > >>>>>> Am I diagnosing this correctly? Should I recreate the array and > >>>>>> explicitly set sunit=16 and swidth=128? > >>>>>> > >>>>>> Thanks for your help. > >>>>>> > >>>>>> Steve > >>>>>> ______________________________________________________________________ > >>>>>> > >>>>>> Steve Cousins, Ocean Modeling Group Email: cousins@umit.maine.edu > >>>>>> Marine Sciences, 452 Aubert Hall http://rocky.umeoce.maine.edu > >>>>>> Univ. of Maine, Orono, ME 04469 Phone: (207) 581-4302 > >>>>>> > >>>>>> > >>>>>> > >>>>> > >>> > >>> > >>> > >> > >> > > > > > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: swidth with mdadm and RAID6 2006-09-18 20:44 ` Steve Cousins @ 2006-09-18 21:06 ` Shailendra Tripathi 2006-09-18 22:13 ` Shailendra Tripathi 1 sibling, 0 replies; 19+ messages in thread From: Shailendra Tripathi @ 2006-09-18 21:06 UTC (permalink / raw) To: cousins; +Cc: "xfs@oss.sgi.com" <xfs@oss.sgi.com> -shailendra >Since I have a spare in there do you think it is starting with md.nr_disks >= 11 and then subtracting two? > > You can verify that very quickly by removing the spare_disks option and see it gives proper results. >Thanks, > >Steve >______________________________________________________________________ > Steve Cousins, Ocean Modeling Group Email: cousins@umit.maine.edu > Marine Sciences, 452 Aubert Hall http://rocky.umeoce.maine.edu > Univ. of Maine, Orono, ME 04469 Phone: (207) 581-4302 > >On Mon, 18 Sep 2006, Shailendra Tripathi wrote: > > > >>Hi Steve, >> Both of us are using old xfsprogs. It is handled in new >>xfsprogs. >> >> */ >> switch (md.level) { >> case 6: >> md.nr_disks--; >> /* fallthrough */ >> case 5: >> case 4: >> md.nr_disks--; >> /* fallthrough */ >> case 1: >> case 0: >> case 10: >> break; >> default: >> return 0; >> >> >>Regards, >> >>Shailendra Tripathi wrote: >> >> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: swidth with mdadm and RAID6 2006-09-18 20:44 ` Steve Cousins 2006-09-18 21:06 ` Shailendra Tripathi @ 2006-09-18 22:13 ` Shailendra Tripathi 2006-09-19 5:11 ` Timothy Shimmin 1 sibling, 1 reply; 19+ messages in thread From: Shailendra Tripathi @ 2006-09-18 22:13 UTC (permalink / raw) To: cousins; +Cc: "xfs@oss.sgi.com" <xfs@oss.sgi.com> Hi Steve, Your guess appears to be correct. md_ioctl returns nr which is total number of disk in the array including the spare disks. However, XFS function md_get_vol_stripe does not take spare disk into account. It needs to subtract spare_disks as well. However, md.spare_disks returned by the call returns spare + parity (both). So, one way could be substract spare_disks directly. Otherwise, the xfs should rely on md.raid_disks. This does not include spare_disks and nr.disks should be changed for that. When I run my program md_info on raid5 array with 5 devices and 2 spares, I get [root@ga09 root]# ./a.out /dev/md11 Level 5, disks=7 spare_disks=3 raid_disks=5 Steve can you please compile the pasted program and run on your system with md prepared. It takes /dev/md<no> as input. In your case, you should get above line as: Level 6, disks=11 spare disks=3 raid_disks=10 nr=working=active=failed=spare=0; ITERATE_RDEV(mddev,rdev,tmp) { nr++; if (rdev->faulty) failed++; else { working++; if (rdev->in_sync) active++; else spare++; } } info.level = mddev->level; info.size = mddev->size; info.nr_disks = nr; .... info.active_disks = active; info.working_disks = working; info.failed_disks = failed; info.spare_disks = spare; -shailendra The program is pasted below: md_info.c. Takes /dev/md<no> as name. For example, /dev/md11. #include<stdio.h> #include<fcntl.h> #include<sys/ioctl.h> #ifndef MD_MAJOR #define MD_MAJOR 9 #endif #define GET_ARRAY_INFO _IOR (MD_MAJOR, 0x11, struct md_array_info) struct md_array_info { __uint32_t major_version; __uint32_t minor_version; __uint32_t patch_version; __uint32_t ctime; __uint32_t level; __uint32_t size; __uint32_t nr_disks; __uint32_t raid_disks; __uint32_t md_minor; __uint32_t not_persistent; /* * Generic state information */ __uint32_t utime; /* 0 Superblock update time */ __uint32_t state; /* 1 State bits (clean, ...) */ __uint32_t active_disks; /* 2 Number of currently active disks */ __uint32_t working_disks; /* 3 Number of working disks */ __uint32_t failed_disks; /* 4 Number of failed disks */ __uint32_t spare_disks; /* 5 Number of spare disks */ /* * Personality information */ __uint32_t layout; /* 0 the array's physical layout */ __uint32_t chunk_size; /* 1 chunk size in bytes */ }; int main(int argc, char *argv[]) { struct md_array_info md; int fd; /* Open device */ fd = open(argv[1], O_RDONLY); if (fd == -1) { printf("Could not open %s\n", argv[1]); exit(1); } if (ioctl(fd, GET_ARRAY_INFO, &md)) { printf("Error getting MD array info from %s\n", argv[1]); exit(1); } close(fd); printf("Level %d, disks=%d spare_disks=%d raid_disks=%d\n", md.level, md.nr_disks, md.spare_disks, md.raid_disks); return 0; } ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: swidth with mdadm and RAID6 2006-09-18 22:13 ` Shailendra Tripathi @ 2006-09-19 5:11 ` Timothy Shimmin 2006-09-19 6:44 ` Shailendra Tripathi 0 siblings, 1 reply; 19+ messages in thread From: Timothy Shimmin @ 2006-09-19 5:11 UTC (permalink / raw) To: Shailendra Tripathi Cc: cousins, "xfs@oss.sgi.com" <xfs@oss.sgi.com> Hi Shailendra and Steve, Shailendra Tripathi wrote: > Hi Steve, > Your guess appears to be correct. md_ioctl returns nr which > is total number of disk in the array including the spare disks. However, > XFS function md_get_vol_stripe does not take spare disk into account. It > needs to subtract spare_disks as well. > However, md.spare_disks returned by the call returns spare + parity > (both). So, one way could be substract spare_disks directly. Otherwise, > the xfs should rely on md.raid_disks. This does not include spare_disks > and nr.disks should be changed for that. > When I run my program md_info on raid5 array with 5 devices and 2 > spares, I get > [root@ga09 root]# ./a.out /dev/md11 > Level 5, disks=7 spare_disks=3 raid_disks=5 > > Steve can you please compile the pasted program and run on your system > with md prepared. It takes /dev/md<no> as input. > In your case, you should get above line as: > Level 6, disks=11 spare disks=3 raid_disks=10 > > nr=working=active=failed=spare=0; > ITERATE_RDEV(mddev,rdev,tmp) { > nr++; > if (rdev->faulty) > failed++; > else { > working++; > if (rdev->in_sync) > active++; > else > spare++; > } > } > > info.level = mddev->level; > info.size = mddev->size; > info.nr_disks = nr; > .... > info.active_disks = active; > info.working_disks = working; > info.failed_disks = failed; > info.spare_disks = spare; > > -shailendra I'm not that au fait with RAID and md, but looking at what you wrote, Shailendra, and the md code, instead of your suggestions (what I think are your suggestions:) of: (1) subtracting parity from md.raid_disk (instead of md.nr_disks) where we work out parity by switching on md.level or (2) using directly: (md.nr_disks - md.spares); that instead we could use: (3) using directly: md.active_disks i.e. *swidth = *sunit * md.active_disks; I presume that active is the working non spares and non-parity. Does that make sense? --Tim ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: swidth with mdadm and RAID6 2006-09-19 5:11 ` Timothy Shimmin @ 2006-09-19 6:44 ` Shailendra Tripathi 2006-09-19 7:02 ` Timothy Shimmin 0 siblings, 1 reply; 19+ messages in thread From: Shailendra Tripathi @ 2006-09-19 6:44 UTC (permalink / raw) To: Timothy Shimmin Cc: cousins, "xfs@oss.sgi.com" <xfs@oss.sgi.com> Hi Tim, > I'm not that au fait with RAID and md, but looking at what you wrote, > Shailendra, and the md code, instead of your suggestions > (what I think are your suggestions:) of: > > (1) subtracting parity from md.raid_disk (instead of md.nr_disks) > where we work out parity by switching on md.level > or > (2) using directly: (md.nr_disks - md.spares); > > that instead we could use: > (3) using directly: md.active_disks > > i.e. > *swidth = *sunit * md.active_disks; > I presume that active is the working non spares and non-parity. > > Does that make sense? I agree with you that for operational raid since there would not be any faulty disks, active disks should the number of disks. However, I am just concerned that active disks tracks live disks (not failed disks). If we ever used these commands when the system has faulty drive, the information returned wouldn't be correct. Though, from XFS perspective, I can't think of where it can happen. I would still say that lets rely more on raid_disks to be more conservative, just my choice. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: swidth with mdadm and RAID6 2006-09-19 6:44 ` Shailendra Tripathi @ 2006-09-19 7:02 ` Timothy Shimmin 0 siblings, 0 replies; 19+ messages in thread From: Timothy Shimmin @ 2006-09-19 7:02 UTC (permalink / raw) To: Shailendra Tripathi Cc: cousins, "xfs@oss.sgi.com" <xfs@oss.sgi.com> Shailendra Tripathi wrote: > > Hi Tim, > >> I'm not that au fait with RAID and md, but looking at what you wrote, >> Shailendra, and the md code, instead of your suggestions >> (what I think are your suggestions:) of: >> >> (1) subtracting parity from md.raid_disk (instead of md.nr_disks) >> where we work out parity by switching on md.level >> or >> (2) using directly: (md.nr_disks - md.spares); >> >> that instead we could use: >> (3) using directly: md.active_disks >> >> i.e. >> *swidth = *sunit * md.active_disks; >> I presume that active is the working non spares and non-parity. >> >> Does that make sense? > I agree with you that for operational raid since there would not > be any faulty disks, active disks should the number of disks. However, I > am just concerned that active disks tracks live disks (not failed > disks). If we ever used these commands when the system has faulty drive, > the information returned wouldn't be correct. Though, from XFS > perspective, I can't think of where it can happen. > I would still say that lets rely more on raid_disks to be more > conservative, just my choice. I see your point. I can just change md_get_subvol_stripe(): s/nr_disks/raid_disks/ I just liked the idea of removing the switch statement which could potentially get out of date in the future. Too bad :) --Tim ^ permalink raw reply [flat|nested] 19+ messages in thread
[parent not found: <fc.004c4d192b2a17d13b9aca00b4f73745.2b2a26d7@umit.maine.edu>]
* Re: swidth with mdadm and RAID6 [not found] <fc.004c4d192b2a17d13b9aca00b4f73745.2b2a26d7@umit.maine.edu> @ 2006-09-18 15:33 ` Steve Cousins 2006-09-18 18:10 ` Shailendra Tripathi 0 siblings, 1 reply; 19+ messages in thread From: Steve Cousins @ 2006-09-18 15:33 UTC (permalink / raw) To: Shailendra Tripathi; +Cc: xfs@oss.sgi.com Hi Shailendra, Here is the info: 1. [root@juno ~]# cat /proc/mdstat Personalities : [raid6] md0 : active raid6 sdb[0] sdl[10](S) sdk[9] sdj[8] sdi[7] sdh[6] sdg[5] sdf[4] sde[3] sdd[2] sdc[1] 3907091968 blocks level 6, 64k chunk, algorithm 2 [10/10] [UUUUUUUUUU] unused devices: <none> 2. mdadm --create /dev/md0 --chunk=64 --level=6 --raid-devices=10 --spare-devices=1 /dev/sd[bcdefghijkl] 3. [root@juno ~]# xfs_db -r /dev/md* xfs_db> sb xfs_db> p magicnum = 0x58465342 blocksize = 4096 dblocks = 976772992 rblocks = 0 rextents = 0 uuid = 04b32cce-ed38-496f-811f-2ccd51450bf4 logstart = 536870919 rootino = 256 rbmino = 257 rsumino = 258 rextsize = 144 agblocks = 30524160 agcount = 32 rbmblocks = 0 logblocks = 32768 versionnum = 0x3d84 sectsize = 4096 inodesize = 256 inopblock = 16 fname = "\000\000\000\000\000\000\000\000\000\000\000\000" blocklog = 12 sectlog = 12 inodelog = 8 inopblog = 4 agblklog = 25 rextslog = 0 inprogress = 0 imax_pct = 25 icount = 36864 ifree = 362 fdblocks = 669630878 frextents = 0 uquotino = 0 gquotino = 0 qflags = 0 flags = 0 shared_vn = 0 inoalignmt = 2 unit = 16 width = 144 dirblklog = 0 logsectlog = 12 logsectsize = 4096 logsunit = 4096 features2 = 0 xfs_db> Thanks for the help. Steve ______________________________________________________________________ Steve Cousins, Ocean Modeling Group Email: cousins@umit.maine.edu Marine Sciences, 452 Aubert Hall http://rocky.umeoce.maine.edu Univ. of Maine, Orono, ME 04469 Phone: (207) 581-4302 On Mon, 18 Sep 2006, Shailendra Tripathi wrote: > Can you list the output of > 1. cat /proc/mdstat > 2. the command to create 8+2 RAID6 with one spare ? > 3. and output of following: > xfs_db -r /dev/md* > xfs_db> sb > xfs_db> p > > -shailendra > > Steve Cousins wrote: > >> I have a RAID6 array of 11 500 GB drives using mdadm. There is one > >> hot-spare so the number of data drives is 8. I used mkfs.xfs with > >> defaults to create the file system and it seemed to pick up the chunk size > >> I used correctly (64K) but I think it got the swidth wrong. Here is what > >> xfs_info says: > >> > >> =========================================================================== > >> meta-data=/dev/md0 isize=256 agcount=32, agsize=30524160 > >> blks > >> = sectsz=4096 attr=0 > >> data = bsize=4096 blocks=976772992, imaxpct=25 > >> = sunit=16 swidth=144 blks, unwritten=1 > >> naming =version 2 bsize=4096 > >> log =internal bsize=4096 blocks=32768, version=2 > >> = sectsz=4096 sunit=1 blks > >> realtime =none extsz=589824 blocks=0, rtextents=0 > >> =========================================================================== > >> > >> So, sunit*bsize=64K, but swidth=144 and swidth/sunit=9 so it looks like it > >> thought there were 9 data drives instead of 8. > >> > >> Am I diagnosing this correctly? Should I recreate the array and > >> explicitly set sunit=16 and swidth=128? > >> > >> Thanks for your help. > >> > >> Steve > >> ______________________________________________________________________ > >> Steve Cousins, Ocean Modeling Group Email: cousins@umit.maine.edu > >> Marine Sciences, 452 Aubert Hall http://rocky.umeoce.maine.edu > >> Univ. of Maine, Orono, ME 04469 Phone: (207) 581-4302 > >> > >> > > > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: swidth with mdadm and RAID6 2006-09-18 15:33 ` Steve Cousins @ 2006-09-18 18:10 ` Shailendra Tripathi 2006-09-18 18:19 ` Shailendra Tripathi 0 siblings, 1 reply; 19+ messages in thread From: Shailendra Tripathi @ 2006-09-18 18:10 UTC (permalink / raw) To: cousins; +Cc: xfs@oss.sgi.com Hi Steve, I checked the code and it appears that XFS is not *aware* of RAID6. Basically, for all md devices, it gets the volume info by making a an ioctl call. I can see that XFS only take care of level 4 and level 5. It does not account for level 6. Only extra line need to be added here as below: if (md.level == 6) md.nr_disks -= 2; /* RAID 6 has 2 parity disks */ You can try with this change if you can. Do let mew know if it solves your problem. This code is in function: md_get_subvol_stripe in <xf_progs>/libdisk/md.c /* Deduct a disk from stripe width on RAID4/5 */ if (md.level == 4 || md.level == 5) md.nr_disks--; /* Update sizes */ *sunit = md.chunk_size >> 9; *swidth = *sunit * md.nr_disks; return 1; } Regards, Shailendra Steve Cousins wrote: >Hi Shailendra, > >Here is the info: > >1. [root@juno ~]# cat /proc/mdstat >Personalities : [raid6] >md0 : active raid6 sdb[0] sdl[10](S) sdk[9] sdj[8] sdi[7] sdh[6] sdg[5] >sdf[4] sde[3] sdd[2] sdc[1] > 3907091968 blocks level 6, 64k chunk, algorithm 2 [10/10] >[UUUUUUUUUU] > >unused devices: <none> > >2. mdadm --create /dev/md0 --chunk=64 --level=6 --raid-devices=10 >--spare-devices=1 /dev/sd[bcdefghijkl] > >3. [root@juno ~]# xfs_db -r /dev/md* >xfs_db> sb >xfs_db> p >magicnum = 0x58465342 >blocksize = 4096 >dblocks = 976772992 >rblocks = 0 >rextents = 0 >uuid = 04b32cce-ed38-496f-811f-2ccd51450bf4 >logstart = 536870919 >rootino = 256 >rbmino = 257 >rsumino = 258 >rextsize = 144 >agblocks = 30524160 >agcount = 32 >rbmblocks = 0 >logblocks = 32768 >versionnum = 0x3d84 >sectsize = 4096 >inodesize = 256 >inopblock = 16 >fname = "\000\000\000\000\000\000\000\000\000\000\000\000" >blocklog = 12 >sectlog = 12 >inodelog = 8 >inopblog = 4 >agblklog = 25 >rextslog = 0 >inprogress = 0 >imax_pct = 25 >icount = 36864 >ifree = 362 >fdblocks = 669630878 >frextents = 0 >uquotino = 0 >gquotino = 0 >qflags = 0 >flags = 0 >shared_vn = 0 >inoalignmt = 2 >unit = 16 >width = 144 >dirblklog = 0 >logsectlog = 12 >logsectsize = 4096 >logsunit = 4096 >features2 = 0 >xfs_db> > > >Thanks for the help. > >Steve > >______________________________________________________________________ > Steve Cousins, Ocean Modeling Group Email: cousins@umit.maine.edu > Marine Sciences, 452 Aubert Hall http://rocky.umeoce.maine.edu > Univ. of Maine, Orono, ME 04469 Phone: (207) 581-4302 > >On Mon, 18 Sep 2006, Shailendra Tripathi wrote: > > > >>Can you list the output of >>1. cat /proc/mdstat >>2. the command to create 8+2 RAID6 with one spare ? >>3. and output of following: >> xfs_db -r /dev/md* >> xfs_db> sb >> xfs_db> p >> >>-shailendra >> >>Steve Cousins wrote: >> >> >>>>I have a RAID6 array of 11 500 GB drives using mdadm. There is one >>>>hot-spare so the number of data drives is 8. I used mkfs.xfs with >>>>defaults to create the file system and it seemed to pick up the chunk size >>>>I used correctly (64K) but I think it got the swidth wrong. Here is what >>>>xfs_info says: >>>> >>>>=========================================================================== >>>>meta-data=/dev/md0 isize=256 agcount=32, agsize=30524160 >>>>blks >>>> = sectsz=4096 attr=0 >>>>data = bsize=4096 blocks=976772992, imaxpct=25 >>>> = sunit=16 swidth=144 blks, unwritten=1 >>>>naming =version 2 bsize=4096 >>>>log =internal bsize=4096 blocks=32768, version=2 >>>> = sectsz=4096 sunit=1 blks >>>>realtime =none extsz=589824 blocks=0, rtextents=0 >>>>=========================================================================== >>>> >>>>So, sunit*bsize=64K, but swidth=144 and swidth/sunit=9 so it looks like it >>>>thought there were 9 data drives instead of 8. >>>> >>>>Am I diagnosing this correctly? Should I recreate the array and >>>>explicitly set sunit=16 and swidth=128? >>>> >>>>Thanks for your help. >>>> >>>>Steve >>>>______________________________________________________________________ >>>> Steve Cousins, Ocean Modeling Group Email: cousins@umit.maine.edu >>>> Marine Sciences, 452 Aubert Hall http://rocky.umeoce.maine.edu >>>> Univ. of Maine, Orono, ME 04469 Phone: (207) 581-4302 >>>> >>>> >>>> >>>> > > > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: swidth with mdadm and RAID6 2006-09-18 18:10 ` Shailendra Tripathi @ 2006-09-18 18:19 ` Shailendra Tripathi 0 siblings, 0 replies; 19+ messages in thread From: Shailendra Tripathi @ 2006-09-18 18:19 UTC (permalink / raw) To: Shailendra Tripathi; +Cc: cousins, xfs@oss.sgi.com Hi Steve, Both of us are using old xfsprogs. It is handled in new xfsprogs. */ switch (md.level) { case 6: md.nr_disks--; /* fallthrough */ case 5: case 4: md.nr_disks--; /* fallthrough */ case 1: case 0: case 10: break; default: return 0; Regards, Shailendra Tripathi wrote: > Hi Steve, > I checked the code and it appears that XFS is not *aware* > of RAID6. Basically, for all md devices, it gets the volume info by > making a an ioctl call. I can see that XFS only take care of level 4 > and level 5. It does not account for level 6. > Only extra line need to be added here as below: > > if (md.level == 6) > md.nr_disks -= 2; /* RAID 6 has 2 parity disks */ > You can try with this change if you can. Do let mew know if it solves > your problem. > > This code is in function: md_get_subvol_stripe in <xf_progs>/libdisk/md.c > > > /* Deduct a disk from stripe width on RAID4/5 */ > if (md.level == 4 || md.level == 5) > md.nr_disks--; > > /* Update sizes */ > *sunit = md.chunk_size >> 9; > *swidth = *sunit * md.nr_disks; > > return 1; > } > > Regards, > Shailendra > Steve Cousins wrote: > >> Hi Shailendra, >> >> Here is the info: >> >> 1. [root@juno ~]# cat /proc/mdstat Personalities : [raid6] md0 : >> active raid6 sdb[0] sdl[10](S) sdk[9] sdj[8] sdi[7] sdh[6] sdg[5] >> sdf[4] sde[3] sdd[2] sdc[1] >> 3907091968 blocks level 6, 64k chunk, algorithm 2 [10/10] >> [UUUUUUUUUU] >> unused devices: <none> >> >> 2. mdadm --create /dev/md0 --chunk=64 --level=6 --raid-devices=10 >> --spare-devices=1 /dev/sd[bcdefghijkl] >> >> 3. [root@juno ~]# xfs_db -r /dev/md* >> xfs_db> sb >> xfs_db> p >> magicnum = 0x58465342 >> blocksize = 4096 >> dblocks = 976772992 >> rblocks = 0 >> rextents = 0 >> uuid = 04b32cce-ed38-496f-811f-2ccd51450bf4 >> logstart = 536870919 >> rootino = 256 >> rbmino = 257 >> rsumino = 258 >> rextsize = 144 >> agblocks = 30524160 >> agcount = 32 >> rbmblocks = 0 >> logblocks = 32768 >> versionnum = 0x3d84 >> sectsize = 4096 >> inodesize = 256 >> inopblock = 16 >> fname = "\000\000\000\000\000\000\000\000\000\000\000\000" >> blocklog = 12 >> sectlog = 12 >> inodelog = 8 >> inopblog = 4 >> agblklog = 25 >> rextslog = 0 >> inprogress = 0 >> imax_pct = 25 >> icount = 36864 >> ifree = 362 >> fdblocks = 669630878 >> frextents = 0 >> uquotino = 0 >> gquotino = 0 >> qflags = 0 >> flags = 0 >> shared_vn = 0 >> inoalignmt = 2 >> unit = 16 >> width = 144 >> dirblklog = 0 >> logsectlog = 12 >> logsectsize = 4096 >> logsunit = 4096 >> features2 = 0 >> xfs_db> >> >> Thanks for the help. >> >> Steve >> >> ______________________________________________________________________ >> Steve Cousins, Ocean Modeling Group Email: cousins@umit.maine.edu >> Marine Sciences, 452 Aubert Hall http://rocky.umeoce.maine.edu >> Univ. of Maine, Orono, ME 04469 Phone: (207) 581-4302 >> >> On Mon, 18 Sep 2006, Shailendra Tripathi wrote: >> >> >> >>> Can you list the output of >>> 1. cat /proc/mdstat >>> 2. the command to create 8+2 RAID6 with one spare ? >>> 3. and output of following: >>> xfs_db -r /dev/md* >>> xfs_db> sb >>> xfs_db> p >>> >>> -shailendra >>> >>> Steve Cousins wrote: >>> >>> >>>>> I have a RAID6 array of 11 500 GB drives using mdadm. There is one >>>>> hot-spare so the number of data drives is 8. I used mkfs.xfs with >>>>> defaults to create the file system and it seemed to pick up the >>>>> chunk size >>>>> I used correctly (64K) but I think it got the swidth wrong. Here >>>>> is what >>>>> xfs_info says: >>>>> >>>>> =========================================================================== >>>>> >>>>> meta-data=/dev/md0 isize=256 agcount=32, >>>>> agsize=30524160 >>>>> blks >>>>> = sectsz=4096 attr=0 >>>>> data = bsize=4096 blocks=976772992, >>>>> imaxpct=25 >>>>> = sunit=16 swidth=144 blks, >>>>> unwritten=1 >>>>> naming =version 2 bsize=4096 >>>>> log =internal bsize=4096 blocks=32768, version=2 >>>>> = sectsz=4096 sunit=1 blks >>>>> realtime =none extsz=589824 blocks=0, rtextents=0 >>>>> =========================================================================== >>>>> >>>>> >>>>> So, sunit*bsize=64K, but swidth=144 and swidth/sunit=9 so it looks >>>>> like it >>>>> thought there were 9 data drives instead of 8. >>>>> Am I diagnosing this correctly? Should I recreate the array and >>>>> explicitly set sunit=16 and swidth=128? >>>>> >>>>> Thanks for your help. >>>>> >>>>> Steve >>>>> ______________________________________________________________________ >>>>> >>>>> Steve Cousins, Ocean Modeling Group Email: cousins@umit.maine.edu >>>>> Marine Sciences, 452 Aubert Hall http://rocky.umeoce.maine.edu >>>>> Univ. of Maine, Orono, ME 04469 Phone: (207) 581-4302 >>>>> >>>>> >>>>> >>>> >> >> >> > > ^ permalink raw reply [flat|nested] 19+ messages in thread
* swidth with mdadm and RAID6
@ 2006-09-15 21:07 Steve Cousins
2006-09-15 23:49 ` Peter Grandi
2006-09-18 14:50 ` Shailendra Tripathi
0 siblings, 2 replies; 19+ messages in thread
From: Steve Cousins @ 2006-09-15 21:07 UTC (permalink / raw)
To: xfs
I have a RAID6 array of 11 500 GB drives using mdadm. There is one
hot-spare so the number of data drives is 8. I used mkfs.xfs with
defaults to create the file system and it seemed to pick up the chunk size
I used correctly (64K) but I think it got the swidth wrong. Here is what
xfs_info says:
===========================================================================
meta-data=/dev/md0 isize=256 agcount=32, agsize=30524160
blks
= sectsz=4096 attr=0
data = bsize=4096 blocks=976772992, imaxpct=25
= sunit=16 swidth=144 blks, unwritten=1
naming =version 2 bsize=4096
log =internal bsize=4096 blocks=32768, version=2
= sectsz=4096 sunit=1 blks
realtime =none extsz=589824 blocks=0, rtextents=0
===========================================================================
So, sunit*bsize=64K, but swidth=144 and swidth/sunit=9 so it looks like it
thought there were 9 data drives instead of 8.
Am I diagnosing this correctly? Should I recreate the array and
explicitly set sunit=16 and swidth=128?
Thanks for your help.
Steve
______________________________________________________________________
Steve Cousins, Ocean Modeling Group Email: cousins@umit.maine.edu
Marine Sciences, 452 Aubert Hall http://rocky.umeoce.maine.edu
Univ. of Maine, Orono, ME 04469 Phone: (207) 581-4302
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: swidth with mdadm and RAID6 2006-09-15 21:07 Steve Cousins @ 2006-09-15 23:49 ` Peter Grandi 2006-09-18 14:50 ` Shailendra Tripathi 1 sibling, 0 replies; 19+ messages in thread From: Peter Grandi @ 2006-09-15 23:49 UTC (permalink / raw) To: Linux XFS >>> On Fri, 15 Sep 2006 17:07:07 -0400 (EDT), Steve Cousins >>> <cousins@limpet.umeoce.maine.edu> said: cousins> I have a RAID6 array of 11 500 GB drives using mdadm. cousins> There is one hot-spare so the number of data drives is cousins> 8. I used mkfs.xfs with defaults to create the file cousins> system and it seemed to pick up the chunk size I used cousins> correctly (64K) but I think it got the swidth wrong. Worrying about the impact on performance of a relatively small thing like 'swidth' for something like an 8+2 RAID6 is quite funny. http://WWW.BAARF.com/ ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: swidth with mdadm and RAID6 2006-09-15 21:07 Steve Cousins 2006-09-15 23:49 ` Peter Grandi @ 2006-09-18 14:50 ` Shailendra Tripathi 1 sibling, 0 replies; 19+ messages in thread From: Shailendra Tripathi @ 2006-09-18 14:50 UTC (permalink / raw) To: cousins; +Cc: xfs Can you list the output of 1. cat /proc/mdstat 2. the command to create 8+2 RAID6 with one spare ? 3. and output of following: xfs_db -r /dev/md* xfs_db> sb xfs_db> p -shailendra Steve Cousins wrote: > I have a RAID6 array of 11 500 GB drives using mdadm. There is one > hot-spare so the number of data drives is 8. I used mkfs.xfs with > defaults to create the file system and it seemed to pick up the chunk size > I used correctly (64K) but I think it got the swidth wrong. Here is what > xfs_info says: > > =========================================================================== > meta-data=/dev/md0 isize=256 agcount=32, agsize=30524160 > blks > = sectsz=4096 attr=0 > data = bsize=4096 blocks=976772992, imaxpct=25 > = sunit=16 swidth=144 blks, unwritten=1 > naming =version 2 bsize=4096 > log =internal bsize=4096 blocks=32768, version=2 > = sectsz=4096 sunit=1 blks > realtime =none extsz=589824 blocks=0, rtextents=0 > =========================================================================== > > So, sunit*bsize=64K, but swidth=144 and swidth/sunit=9 so it looks like it > thought there were 9 data drives instead of 8. > > Am I diagnosing this correctly? Should I recreate the array and > explicitly set sunit=16 and swidth=128? > > Thanks for your help. > > Steve > ______________________________________________________________________ > Steve Cousins, Ocean Modeling Group Email: cousins@umit.maine.edu > Marine Sciences, 452 Aubert Hall http://rocky.umeoce.maine.edu > Univ. of Maine, Orono, ME 04469 Phone: (207) 581-4302 > > ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2006-09-19 20:20 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <fc.004c4d192b2da8e03b9aca0078918430.2b2da8e5@umit.maine.edu>
2006-09-19 16:36 ` swidth with mdadm and RAID6 Steve Cousins
2006-09-19 16:58 ` Shailendra Tripathi
2006-09-19 17:13 ` Steve Cousins
[not found] <fc.004c4d192b3470d73b9aca0029fcf469.2b349301@umit.maine.edu>
2006-09-19 17:52 ` Steve Cousins
2006-09-19 19:22 ` Steve Cousins
2006-09-19 20:19 ` Shailendra Tripathi
[not found] <fc.004c4d192b2c45a93b9aca00fc3f0f38.2b2c4b4d@umit.maine.edu>
2006-09-18 20:28 ` Steve Cousins
2006-09-18 20:44 ` Steve Cousins
2006-09-18 21:06 ` Shailendra Tripathi
2006-09-18 22:13 ` Shailendra Tripathi
2006-09-19 5:11 ` Timothy Shimmin
2006-09-19 6:44 ` Shailendra Tripathi
2006-09-19 7:02 ` Timothy Shimmin
[not found] <fc.004c4d192b2a17d13b9aca00b4f73745.2b2a26d7@umit.maine.edu>
2006-09-18 15:33 ` Steve Cousins
2006-09-18 18:10 ` Shailendra Tripathi
2006-09-18 18:19 ` Shailendra Tripathi
2006-09-15 21:07 Steve Cousins
2006-09-15 23:49 ` Peter Grandi
2006-09-18 14:50 ` Shailendra Tripathi
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox