Re: swidth with mdadm and RAID6

public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed

* Re: swidth with mdadm and RAID6
       [not found] <fc.004c4d192b2da8e03b9aca0078918430.2b2da8e5@umit.maine.edu>
@ 2006-09-19 16:36 ` Steve Cousins
  2006-09-19 16:58   ` Shailendra Tripathi
  2006-09-19 17:13   ` Steve Cousins
  0 siblings, 2 replies; 19+ messages in thread
From: Steve Cousins @ 2006-09-19 16:36 UTC (permalink / raw)
  To: Shailendra Tripathi; +Cc: xfs


Hi Shailendra,

I ran the program and it reports:

	Level 6, disks=11 spare_disks=1 raid_disks=10

which looks good.   I don't understand why you got:

	Level 5, disks=7 spare_disks=3 raid_disks=5

Why would it have 3 spare_disks?

Thanks,

Steve

______________________________________________________________________
 Steve Cousins, Ocean Modeling Group    Email: cousins@umit.maine.edu
 Marine Sciences, 452 Aubert Hall       http://rocky.umeoce.maine.edu
 Univ. of Maine, Orono, ME 04469        Phone: (207) 581-4302

On Mon, 18 Sep 2006, Shailendra Tripathi wrote:

> Hi Steve,
>              Your guess appears to be correct. md_ioctl returns nr which 
> is total number of disk in the array including the spare disks. However, 
> XFS function md_get_vol_stripe does not take spare disk into account. It 
> needs to subtract spare_disks as well.
>      However, md.spare_disks returned by the call returns spare + parity 
> (both). So, one way could be substract spare_disks directly. Otherwise, 
> the xfs should rely on md.raid_disks. This does not include spare_disks 
> and nr.disks should be changed for that.
>     
> When I run my program md_info on  raid5 array with 5 devices and 2 
> spares, I get
> [root@ga09 root]# ./a.out /dev/md11
> Level 5, disks=7 spare_disks=3 raid_disks=5
> 
> Steve can you please compile the pasted program and run on your system 
> with md prepared. It takes /dev/md<no> as input.
> In your case, you should get above line as:
> Level 6, disks=11 spare disks=3 raid_disks=10
> 
>        nr=working=active=failed=spare=0;
>         ITERATE_RDEV(mddev,rdev,tmp) {
>                 nr++;
>                 if (rdev->faulty)
>                         failed++;
>                 else {
>                         working++;
>                         if (rdev->in_sync)
>                                 active++;
>                         else
>                                 spare++;
>                 }
>         }
> 
>         info.level         = mddev->level;
>         info.size          = mddev->size;
>         info.nr_disks      = nr;
>        ....
>         info.active_disks  = active;
>         info.working_disks = working;
>         info.failed_disks  = failed;
>         info.spare_disks   = spare;
> 
> -shailendra
> The program is pasted below:
> md_info.c. Takes /dev/md<no> as name. For example, /dev/md11.
> 
> #include<stdio.h>
> #include<fcntl.h>
> #include<sys/ioctl.h>
> #ifndef MD_MAJOR
> #define MD_MAJOR                9
> #endif
> 
> #define GET_ARRAY_INFO          _IOR (MD_MAJOR, 0x11, struct md_array_info)
> 
> 
> struct md_array_info {
> __uint32_t major_version;
> __uint32_t minor_version;
> __uint32_t patch_version;
> __uint32_t ctime;
> __uint32_t level;
> __uint32_t size;
> __uint32_t nr_disks;
> __uint32_t raid_disks;
> __uint32_t md_minor;
> __uint32_t not_persistent;
> /*
> * Generic state information
> */
> __uint32_t utime;         /*  0 Superblock update time            */
> __uint32_t state;         /*  1 State bits (clean, ...)           */
> __uint32_t active_disks;  /*  2 Number of currently active disks  */
> __uint32_t working_disks; /*  3 Number of working disks           */
> __uint32_t failed_disks;  /*  4 Number of failed disks            */
> __uint32_t spare_disks;   /*  5 Number of spare disks             */
> /*
> * Personality information
> */
> __uint32_t layout;        /*  0 the array's physical layout       */
> __uint32_t chunk_size;    /*  1 chunk size in bytes   */
> 
> };
> 
> int main(int argc, char *argv[])
> {
>         struct md_array_info    md;
>         int                     fd;
> 
> 
>         /* Open device */
>         fd = open(argv[1], O_RDONLY);
>         if (fd == -1) {
>                 printf("Could not open %s\n", argv[1]);
>                 exit(1);
>         }
>         if (ioctl(fd, GET_ARRAY_INFO, &md)) {
>                 printf("Error getting MD array info from %s\n", argv[1]);
>                 exit(1);
>         }
>         close(fd);
>         printf("Level %d, disks=%d spare_disks=%d raid_disks=%d\n", 
> md.level, md.nr_disks,
>                 md.spare_disks, md.raid_disks);
>         return 0;
> }
> 
> 
> 
> 
> 
> 
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: swidth with mdadm and RAID6
  2006-09-19 16:36 ` swidth with mdadm and RAID6 Steve Cousins
@ 2006-09-19 16:58   ` Shailendra Tripathi
  2006-09-19 17:13   ` Steve Cousins
  1 sibling, 0 replies; 19+ messages in thread
From: Shailendra Tripathi @ 2006-09-19 16:58 UTC (permalink / raw)
  To: cousins; +Cc: xfs

> Hi Shailendra,
> 
> I ran the program and it reports:
> 
> 	Level 6, disks=11 spare_disks=1 raid_disks=10
> 
> which looks good.   I don't understand why you got:
> 
> 	Level 5, disks=7 spare_disks=3 raid_disks=5
> 
> Why would it have 3 spare_disks?

Perhaps you are running more recent kernel than mine, and, spare_disks 
now reports only actual spares. It did appear little weired that it 
reported spare_disks as 3. get_array_info is changed in recent kernels 
and that should explain this difference.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: swidth with mdadm and RAID6
  2006-09-19 16:36 ` swidth with mdadm and RAID6 Steve Cousins
  2006-09-19 16:58   ` Shailendra Tripathi
@ 2006-09-19 17:13   ` Steve Cousins
  1 sibling, 0 replies; 19+ messages in thread
From: Steve Cousins @ 2006-09-19 17:13 UTC (permalink / raw)
  To: Shailendra Tripathi; +Cc: xfs


On Tue, 19 Sep 2006, Steve Cousins wrote:

> 
> Hi Shailendra,
> 
> I ran the program and it reports:
> 
> 	Level 6, disks=11 spare_disks=1 raid_disks=10
>
> which looks good.   I don't understand why you got:


To me this looks correct but I was re-reading your original message and
you said:

> > In your case, you should get above line as:
> > Level 6, disks=11 spare disks=3 raid_disks=10

I don't understand why we should expect parity disks to be included as
spare disks.  


Steve

 
> 	Level 5, disks=7 spare_disks=3 raid_disks=5
> 
> Why would it have 3 spare_disks?
> 
> Thanks,
> 
> Steve
> 
> ______________________________________________________________________
>  Steve Cousins, Ocean Modeling Group    Email: cousins@umit.maine.edu
>  Marine Sciences, 452 Aubert Hall       http://rocky.umeoce.maine.edu
>  Univ. of Maine, Orono, ME 04469        Phone: (207) 581-4302
> 
> On Mon, 18 Sep 2006, Shailendra Tripathi wrote:
> 
> > Hi Steve,
> >              Your guess appears to be correct. md_ioctl returns nr which 
> > is total number of disk in the array including the spare disks. However, 
> > XFS function md_get_vol_stripe does not take spare disk into account. It 
> > needs to subtract spare_disks as well.
> >      However, md.spare_disks returned by the call returns spare + parity 
> > (both). So, one way could be substract spare_disks directly. Otherwise, 
> > the xfs should rely on md.raid_disks. This does not include spare_disks 
> > and nr.disks should be changed for that.
> >     
> > When I run my program md_info on  raid5 array with 5 devices and 2 
> > spares, I get
> > [root@ga09 root]# ./a.out /dev/md11
> > Level 5, disks=7 spare_disks=3 raid_disks=5
> > 
> > Steve can you please compile the pasted program and run on your system 
> > with md prepared. It takes /dev/md<no> as input.
> > In your case, you should get above line as:
> > Level 6, disks=11 spare disks=3 raid_disks=10
> > 
> >        nr=working=active=failed=spare=0;
> >         ITERATE_RDEV(mddev,rdev,tmp) {
> >                 nr++;
> >                 if (rdev->faulty)
> >                         failed++;
> >                 else {
> >                         working++;
> >                         if (rdev->in_sync)
> >                                 active++;
> >                         else
> >                                 spare++;
> >                 }
> >         }
> > 
> >         info.level         = mddev->level;
> >         info.size          = mddev->size;
> >         info.nr_disks      = nr;
> >        ....
> >         info.active_disks  = active;
> >         info.working_disks = working;
> >         info.failed_disks  = failed;
> >         info.spare_disks   = spare;
> > 
> > -shailendra
> > The program is pasted below:
> > md_info.c. Takes /dev/md<no> as name. For example, /dev/md11.
> > 
> > #include<stdio.h>
> > #include<fcntl.h>
> > #include<sys/ioctl.h>
> > #ifndef MD_MAJOR
> > #define MD_MAJOR                9
> > #endif
> > 
> > #define GET_ARRAY_INFO          _IOR (MD_MAJOR, 0x11, struct md_array_info)
> > 
> > 
> > struct md_array_info {
> > __uint32_t major_version;
> > __uint32_t minor_version;
> > __uint32_t patch_version;
> > __uint32_t ctime;
> > __uint32_t level;
> > __uint32_t size;
> > __uint32_t nr_disks;
> > __uint32_t raid_disks;
> > __uint32_t md_minor;
> > __uint32_t not_persistent;
> > /*
> > * Generic state information
> > */
> > __uint32_t utime;         /*  0 Superblock update time            */
> > __uint32_t state;         /*  1 State bits (clean, ...)           */
> > __uint32_t active_disks;  /*  2 Number of currently active disks  */
> > __uint32_t working_disks; /*  3 Number of working disks           */
> > __uint32_t failed_disks;  /*  4 Number of failed disks            */
> > __uint32_t spare_disks;   /*  5 Number of spare disks             */
> > /*
> > * Personality information
> > */
> > __uint32_t layout;        /*  0 the array's physical layout       */
> > __uint32_t chunk_size;    /*  1 chunk size in bytes   */
> > 
> > };
> > 
> > int main(int argc, char *argv[])
> > {
> >         struct md_array_info    md;
> >         int                     fd;
> > 
> > 
> >         /* Open device */
> >         fd = open(argv[1], O_RDONLY);
> >         if (fd == -1) {
> >                 printf("Could not open %s\n", argv[1]);
> >                 exit(1);
> >         }
> >         if (ioctl(fd, GET_ARRAY_INFO, &md)) {
> >                 printf("Error getting MD array info from %s\n", argv[1]);
> >                 exit(1);
> >         }
> >         close(fd);
> >         printf("Level %d, disks=%d spare_disks=%d raid_disks=%d\n", 
> > md.level, md.nr_disks,
> >                 md.spare_disks, md.raid_disks);
> >         return 0;
> > }
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> 
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

[parent not found: <fc.004c4d192b3470d73b9aca0029fcf469.2b349301@umit.maine.edu>]

* Re: swidth with mdadm and RAID6
       [not found] <fc.004c4d192b3470d73b9aca0029fcf469.2b349301@umit.maine.edu>
@ 2006-09-19 17:52 ` Steve Cousins
  2006-09-19 19:22   ` Steve Cousins
  0 siblings, 1 reply; 19+ messages in thread
From: Steve Cousins @ 2006-09-19 17:52 UTC (permalink / raw)
  To: Shailendra Tripathi; +Cc: xfs@oss.sgi.com



On Tue, 19 Sep 2006, Shailendra Tripathi wrote:

> >> Hi Shailendra,
> >> 
> >> I ran the program and it reports:
> >> 
> >> 	Level 6, disks=11 spare_disks=1 raid_disks=10
> >> 
> >> which looks good.   I don't understand why you got:
> >> 
> >> 	Level 5, disks=7 spare_disks=3 raid_disks=5
> >> 
> >> Why would it have 3 spare_disks?
> >
> Perhaps you are running more recent kernel than mine, and, spare_disks 
> now reports only actual spares. It did appear little weired that it 
> reported spare_disks as 3. get_array_info is changed in recent kernels 
> and that should explain this difference.

This is a 2.6.17 kernel.  So, with this in mind, is there a change that I
should try in libdisk/md.c?  Tim had suggested:

	s/nr_disks/raid_disks/

Would this be sufficient? Or should nr_disks be initialized as raid_disks
and then go into the switch clause?

Steve

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: swidth with mdadm and RAID6
  2006-09-19 17:52 ` Steve Cousins
@ 2006-09-19 19:22   ` Steve Cousins
  2006-09-19 20:19     ` Shailendra Tripathi
  0 siblings, 1 reply; 19+ messages in thread
From: Steve Cousins @ 2006-09-19 19:22 UTC (permalink / raw)
  To: Shailendra Tripathi; +Cc: xfs@oss.sgi.com




On Tue, 19 Sep 2006, Steve Cousins wrote:

> This is a 2.6.17 kernel.  So, with this in mind, is there a change that I
> should try in libdisk/md.c?  Tim had suggested:
> 
> 	s/nr_disks/raid_disks/
> 
> Would this be sufficient? Or should nr_disks be initialized as raid_disks
> and then go into the switch clause?

I ended up just adding:

	md.nr_disks = md.raid_disks;

right be fore the switch statement and it worked fine in my situation.
Not sure how this would work with other kernels etc. but I'll let you
figure that out.  

Thanks very much for your help.

Steve

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: swidth with mdadm and RAID6
  2006-09-19 19:22   ` Steve Cousins
@ 2006-09-19 20:19     ` Shailendra Tripathi
  0 siblings, 0 replies; 19+ messages in thread
From: Shailendra Tripathi @ 2006-09-19 20:19 UTC (permalink / raw)
  To: cousins; +Cc: xfs@oss.sgi.com

Steve Cousins wrote:

>>a 2.6.17 kernel.  So, with this in mind, is there a change that I
>>should try in libdisk/md.c?  Tim had suggested:
>>
>>	s/nr_disks/raid_disks/
>>
>>Would this be sufficient? Or should nr_disks be initialized as raid_disks
>>and then go into the switch clause?
>>    
>>
>
>I ended up just adding:
>
>	md.nr_disks = md.raid_disks;
>
>right be fore the switch statement and it worked fine in my situation.
>Not sure how this would work with other kernels etc. but I'll let you
>figure that out.  
>
>Thanks very much for your help.
>
>Steve
>
>  
>
Hi Steve,
             Technically speaking, you are doing the same thing. 
However, just write the function below to avoid any confusion.

int
md_get_subvol_stripe(
	char		*dfile,
	sv_type_t	type,
	int		*sunit,
	int		*swidth,
	int		*sectalign,
	struct stat64	*sb)
{
	if (mnt_is_md_subvol(sb->st_rdev)) {
		struct md_array_info	md;
		int			fd;

		/* Open device */
		fd = open(dfile, O_RDONLY);
		if (fd == -1)
			return 0;

		/* Is this thing on... */
		if (ioctl(fd, GET_ARRAY_INFO, &md)) {
			fprintf(stderr,
				_("Error getting MD array info from %s\n"),
				dfile);
			exit(1);
		}
		close(fd);

		/*
		 * Ignore levels we don't want aligned (e.g. linear)
		 * and deduct disk(s) from stripe width on RAID4/5/6
		 */
		switch (md.level) {
		case 6:
			md.raid_disks--;
			/* fallthrough */
		case 5:
		case 4:
			md.raid_disks--;
			/* fallthrough */
		case 1:
		case 0:
		case 10:
			break;
		default:
			return 0;
		}

		/* Update sizes */
		*sunit = md.chunk_size >> 9;
		*swidth = *sunit * md.raid_disks;
		*sectalign = (md.level == 4 || md.level == 5 || md.level == 6);

		return 1;
	}
	return 0;
}

^ permalink raw reply	[flat|nested] 19+ messages in thread

[parent not found: <fc.004c4d192b2c45a93b9aca00fc3f0f38.2b2c4b4d@umit.maine.edu>]

* Re: swidth with mdadm and RAID6
       [not found] <fc.004c4d192b2c45a93b9aca00fc3f0f38.2b2c4b4d@umit.maine.edu>
@ 2006-09-18 20:28 ` Steve Cousins
  2006-09-18 20:44 ` Steve Cousins
  1 sibling, 0 replies; 19+ messages in thread
From: Steve Cousins @ 2006-09-18 20:28 UTC (permalink / raw)
  To: Shailendra Tripathi; +Cc: "xfs@oss.sgi.com" <xfs@oss.sgi.com>


Thanks very much Shailendra.  I'll give it a try.

Steve

______________________________________________________________________
 Steve Cousins, Ocean Modeling Group    Email: cousins@umit.maine.edu
 Marine Sciences, 452 Aubert Hall       http://rocky.umeoce.maine.edu
 Univ. of Maine, Orono, ME 04469        Phone: (207) 581-4302

On Mon, 18 Sep 2006, Shailendra Tripathi wrote:

> Hi Steve,
>                Both of us are using old xfsprogs. It is handled in new 
> xfsprogs. 
> 
>  */
> 		switch (md.level) {
> 		case 6:
> 			md.nr_disks--;
> 			/* fallthrough */
> 		case 5:
> 		case 4:
> 			md.nr_disks--;
> 			/* fallthrough */
> 		case 1:
> 		case 0:
> 		case 10:
> 			break;
> 		default:
> 			return 0;
> 
> 
> Regards,
> 
> Shailendra Tripathi wrote:
> 
> >> Hi Steve,
> >>            I checked the code and it appears that XFS is not *aware* 
> >> of RAID6. Basically, for all md devices, it gets the volume info by 
> >> making a an ioctl call. I can see that XFS only take care of level 4 
> >> and level 5. It does not account for level 6.
> >> Only extra line need to be added here as below:
> >>
> >> if (md.level == 6)
> >>     md.nr_disks -= 2; /* RAID 6 has  2  parity disks */
> >> You can try with this change if you can. Do let mew know if it solves 
> >> your problem.
> >>
> >> This code is in function: md_get_subvol_stripe in <xf_progs>/libdisk/md.c
> >>
> >>
> >>               /* Deduct a disk from stripe width on RAID4/5 */
> >>                if (md.level == 4 || md.level == 5)
> >>                        md.nr_disks--;
> >>
> >>                /* Update sizes */
> >>                *sunit = md.chunk_size >> 9;
> >>                *swidth = *sunit * md.nr_disks;
> >>
> >>                return 1;
> >>        }
> >>
> >> Regards,
> >> Shailendra
> >> Steve Cousins wrote:
> >>
> >>> Hi Shailendra,
> >>>
> >>> Here is the info:
> >>>
> >>> 1. [root@juno ~]# cat /proc/mdstat Personalities : [raid6] md0 : 
> >>> active raid6 sdb[0] sdl[10](S) sdk[9] sdj[8] sdi[7] sdh[6] sdg[5]
> >>> sdf[4] sde[3] sdd[2] sdc[1]
> >>>      3907091968 blocks level 6, 64k chunk, algorithm 2 [10/10]
> >>> [UUUUUUUUUU]
> >>>      unused devices: <none>
> >>>
> >>> 2. mdadm --create /dev/md0 --chunk=64 --level=6 --raid-devices=10
> >>> --spare-devices=1 /dev/sd[bcdefghijkl]
> >>>
> >>> 3.  [root@juno ~]# xfs_db -r /dev/md*
> >>> xfs_db> sb
> >>> xfs_db> p
> >>> magicnum = 0x58465342
> >>> blocksize = 4096
> >>> dblocks = 976772992
> >>> rblocks = 0
> >>> rextents = 0
> >>> uuid = 04b32cce-ed38-496f-811f-2ccd51450bf4
> >>> logstart = 536870919
> >>> rootino = 256
> >>> rbmino = 257
> >>> rsumino = 258
> >>> rextsize = 144
> >>> agblocks = 30524160
> >>> agcount = 32
> >>> rbmblocks = 0
> >>> logblocks = 32768
> >>> versionnum = 0x3d84
> >>> sectsize = 4096
> >>> inodesize = 256
> >>> inopblock = 16
> >>> fname = "\000\000\000\000\000\000\000\000\000\000\000\000"
> >>> blocklog = 12
> >>> sectlog = 12
> >>> inodelog = 8
> >>> inopblog = 4
> >>> agblklog = 25
> >>> rextslog = 0
> >>> inprogress = 0
> >>> imax_pct = 25
> >>> icount = 36864
> >>> ifree = 362
> >>> fdblocks = 669630878
> >>> frextents = 0
> >>> uquotino = 0
> >>> gquotino = 0
> >>> qflags = 0
> >>> flags = 0
> >>> shared_vn = 0
> >>> inoalignmt = 2
> >>> unit = 16
> >>> width = 144
> >>> dirblklog = 0
> >>> logsectlog = 12
> >>> logsectsize = 4096
> >>> logsunit = 4096
> >>> features2 = 0
> >>> xfs_db>
> >>>
> >>> Thanks for the help.
> >>>
> >>> Steve
> >>>
> >>> ______________________________________________________________________
> >>> Steve Cousins, Ocean Modeling Group    Email: cousins@umit.maine.edu
> >>> Marine Sciences, 452 Aubert Hall       http://rocky.umeoce.maine.edu
> >>> Univ. of Maine, Orono, ME 04469        Phone: (207) 581-4302
> >>>
> >>> On Mon, 18 Sep 2006, Shailendra Tripathi wrote:
> >>>
> >>>  
> >>>
> >>>> Can you list the output of
> >>>> 1. cat /proc/mdstat
> >>>> 2. the command to create 8+2 RAID6 with one spare ?
> >>>> 3. and output of following:
> >>>>    xfs_db -r /dev/md*
> >>>>    xfs_db> sb
> >>>>    xfs_db> p
> >>>>
> >>>> -shailendra
> >>>>
> >>>> Steve Cousins wrote:
> >>>>   
> >>>>
> >>>>>> I have a RAID6 array of 11 500 GB drives using mdadm.  There is one
> >>>>>> hot-spare so the number of data drives is 8.  I used mkfs.xfs with
> >>>>>> defaults to create the file system and it seemed to pick up the 
> >>>>>> chunk size
> >>>>>> I used correctly (64K) but I think it got the swidth wrong.  Here 
> >>>>>> is what
> >>>>>> xfs_info says:
> >>>>>>
> >>>>>> =========================================================================== 
> >>>>>>
> >>>>>> meta-data=/dev/md0               isize=256    agcount=32, 
> >>>>>> agsize=30524160
> >>>>>> blks
> >>>>>>         =                       sectsz=4096  attr=0
> >>>>>> data     =                       bsize=4096   blocks=976772992, 
> >>>>>> imaxpct=25
> >>>>>>         =                       sunit=16     swidth=144 blks, 
> >>>>>> unwritten=1
> >>>>>> naming   =version 2              bsize=4096
> >>>>>> log      =internal               bsize=4096   blocks=32768, version=2
> >>>>>>         =                       sectsz=4096  sunit=1 blks
> >>>>>> realtime =none                   extsz=589824 blocks=0, rtextents=0
> >>>>>> =========================================================================== 
> >>>>>>
> >>>>>>
> >>>>>> So, sunit*bsize=64K, but swidth=144 and swidth/sunit=9 so it looks 
> >>>>>> like it
> >>>>>> thought there were 9 data drives instead of 8. 
> >>>>>> Am I diagnosing this correctly?  Should I recreate the array and
> >>>>>> explicitly set sunit=16 and swidth=128?
> >>>>>>
> >>>>>> Thanks for your help.
> >>>>>>
> >>>>>> Steve
> >>>>>> ______________________________________________________________________ 
> >>>>>>
> >>>>>> Steve Cousins, Ocean Modeling Group    Email: cousins@umit.maine.edu
> >>>>>> Marine Sciences, 452 Aubert Hall       http://rocky.umeoce.maine.edu
> >>>>>> Univ. of Maine, Orono, ME 04469        Phone: (207) 581-4302
> >>>>>>
> >>>>>>
> >>>>>>       
> >>>>>
> >>>
> >>>  
> >>>
> >>
> >>
> >
> >
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: swidth with mdadm and RAID6
       [not found] <fc.004c4d192b2c45a93b9aca00fc3f0f38.2b2c4b4d@umit.maine.edu>
  2006-09-18 20:28 ` Steve Cousins
@ 2006-09-18 20:44 ` Steve Cousins
  2006-09-18 21:06   ` Shailendra Tripathi
  2006-09-18 22:13   ` Shailendra Tripathi
  1 sibling, 2 replies; 19+ messages in thread
From: Steve Cousins @ 2006-09-18 20:44 UTC (permalink / raw)
  To: Shailendra Tripathi; +Cc: "xfs@oss.sgi.com" <xfs@oss.sgi.com>


Hi again,

Still no luck with 2.8.11:

[root@juno xfsprogs-2.8.11]# cd mkfs

[root@juno mkfs]# ./mkfs.xfs -f /dev/md0
meta-data=/dev/md0               isize=256    agcount=32, agsize=30524160
blks
         =                       sectsz=4096  attr=0
data     =                       bsize=4096   blocks=976772992, imaxpct=25
         =                       sunit=16     swidth=144 blks, unwritten=1
naming   =version 2              bsize=4096  
log      =internal log           bsize=4096   blocks=32768, version=2
         =                       sectsz=4096  sunit=1 blks
realtime =none                   extsz=589824 blocks=0, rtextents=0

Since I have a spare in there do you think it is starting with md.nr_disks
= 11 and then subtracting two?

Thanks,

Steve
______________________________________________________________________
 Steve Cousins, Ocean Modeling Group    Email: cousins@umit.maine.edu
 Marine Sciences, 452 Aubert Hall       http://rocky.umeoce.maine.edu
 Univ. of Maine, Orono, ME 04469        Phone: (207) 581-4302

On Mon, 18 Sep 2006, Shailendra Tripathi wrote:

> Hi Steve,
>                Both of us are using old xfsprogs. It is handled in new 
> xfsprogs. 
> 
>  */
> 		switch (md.level) {
> 		case 6:
> 			md.nr_disks--;
> 			/* fallthrough */
> 		case 5:
> 		case 4:
> 			md.nr_disks--;
> 			/* fallthrough */
> 		case 1:
> 		case 0:
> 		case 10:
> 			break;
> 		default:
> 			return 0;
> 
> 
> Regards,
> 
> Shailendra Tripathi wrote:
> 
> >> Hi Steve,
> >>            I checked the code and it appears that XFS is not *aware* 
> >> of RAID6. Basically, for all md devices, it gets the volume info by 
> >> making a an ioctl call. I can see that XFS only take care of level 4 
> >> and level 5. It does not account for level 6.
> >> Only extra line need to be added here as below:
> >>
> >> if (md.level == 6)
> >>     md.nr_disks -= 2; /* RAID 6 has  2  parity disks */
> >> You can try with this change if you can. Do let mew know if it solves 
> >> your problem.
> >>
> >> This code is in function: md_get_subvol_stripe in <xf_progs>/libdisk/md.c
> >>
> >>
> >>               /* Deduct a disk from stripe width on RAID4/5 */
> >>                if (md.level == 4 || md.level == 5)
> >>                        md.nr_disks--;
> >>
> >>                /* Update sizes */
> >>                *sunit = md.chunk_size >> 9;
> >>                *swidth = *sunit * md.nr_disks;
> >>
> >>                return 1;
> >>        }
> >>
> >> Regards,
> >> Shailendra
> >> Steve Cousins wrote:
> >>
> >>> Hi Shailendra,
> >>>
> >>> Here is the info:
> >>>
> >>> 1. [root@juno ~]# cat /proc/mdstat Personalities : [raid6] md0 : 
> >>> active raid6 sdb[0] sdl[10](S) sdk[9] sdj[8] sdi[7] sdh[6] sdg[5]
> >>> sdf[4] sde[3] sdd[2] sdc[1]
> >>>      3907091968 blocks level 6, 64k chunk, algorithm 2 [10/10]
> >>> [UUUUUUUUUU]
> >>>      unused devices: <none>
> >>>
> >>> 2. mdadm --create /dev/md0 --chunk=64 --level=6 --raid-devices=10
> >>> --spare-devices=1 /dev/sd[bcdefghijkl]
> >>>
> >>> 3.  [root@juno ~]# xfs_db -r /dev/md*
> >>> xfs_db> sb
> >>> xfs_db> p
> >>> magicnum = 0x58465342
> >>> blocksize = 4096
> >>> dblocks = 976772992
> >>> rblocks = 0
> >>> rextents = 0
> >>> uuid = 04b32cce-ed38-496f-811f-2ccd51450bf4
> >>> logstart = 536870919
> >>> rootino = 256
> >>> rbmino = 257
> >>> rsumino = 258
> >>> rextsize = 144
> >>> agblocks = 30524160
> >>> agcount = 32
> >>> rbmblocks = 0
> >>> logblocks = 32768
> >>> versionnum = 0x3d84
> >>> sectsize = 4096
> >>> inodesize = 256
> >>> inopblock = 16
> >>> fname = "\000\000\000\000\000\000\000\000\000\000\000\000"
> >>> blocklog = 12
> >>> sectlog = 12
> >>> inodelog = 8
> >>> inopblog = 4
> >>> agblklog = 25
> >>> rextslog = 0
> >>> inprogress = 0
> >>> imax_pct = 25
> >>> icount = 36864
> >>> ifree = 362
> >>> fdblocks = 669630878
> >>> frextents = 0
> >>> uquotino = 0
> >>> gquotino = 0
> >>> qflags = 0
> >>> flags = 0
> >>> shared_vn = 0
> >>> inoalignmt = 2
> >>> unit = 16
> >>> width = 144
> >>> dirblklog = 0
> >>> logsectlog = 12
> >>> logsectsize = 4096
> >>> logsunit = 4096
> >>> features2 = 0
> >>> xfs_db>
> >>>
> >>> Thanks for the help.
> >>>
> >>> Steve
> >>>
> >>> ______________________________________________________________________
> >>> Steve Cousins, Ocean Modeling Group    Email: cousins@umit.maine.edu
> >>> Marine Sciences, 452 Aubert Hall       http://rocky.umeoce.maine.edu
> >>> Univ. of Maine, Orono, ME 04469        Phone: (207) 581-4302
> >>>
> >>> On Mon, 18 Sep 2006, Shailendra Tripathi wrote:
> >>>
> >>>  
> >>>
> >>>> Can you list the output of
> >>>> 1. cat /proc/mdstat
> >>>> 2. the command to create 8+2 RAID6 with one spare ?
> >>>> 3. and output of following:
> >>>>    xfs_db -r /dev/md*
> >>>>    xfs_db> sb
> >>>>    xfs_db> p
> >>>>
> >>>> -shailendra
> >>>>
> >>>> Steve Cousins wrote:
> >>>>   
> >>>>
> >>>>>> I have a RAID6 array of 11 500 GB drives using mdadm.  There is one
> >>>>>> hot-spare so the number of data drives is 8.  I used mkfs.xfs with
> >>>>>> defaults to create the file system and it seemed to pick up the 
> >>>>>> chunk size
> >>>>>> I used correctly (64K) but I think it got the swidth wrong.  Here 
> >>>>>> is what
> >>>>>> xfs_info says:
> >>>>>>
> >>>>>> =========================================================================== 
> >>>>>>
> >>>>>> meta-data=/dev/md0               isize=256    agcount=32, 
> >>>>>> agsize=30524160
> >>>>>> blks
> >>>>>>         =                       sectsz=4096  attr=0
> >>>>>> data     =                       bsize=4096   blocks=976772992, 
> >>>>>> imaxpct=25
> >>>>>>         =                       sunit=16     swidth=144 blks, 
> >>>>>> unwritten=1
> >>>>>> naming   =version 2              bsize=4096
> >>>>>> log      =internal               bsize=4096   blocks=32768, version=2
> >>>>>>         =                       sectsz=4096  sunit=1 blks
> >>>>>> realtime =none                   extsz=589824 blocks=0, rtextents=0
> >>>>>> =========================================================================== 
> >>>>>>
> >>>>>>
> >>>>>> So, sunit*bsize=64K, but swidth=144 and swidth/sunit=9 so it looks 
> >>>>>> like it
> >>>>>> thought there were 9 data drives instead of 8. 
> >>>>>> Am I diagnosing this correctly?  Should I recreate the array and
> >>>>>> explicitly set sunit=16 and swidth=128?
> >>>>>>
> >>>>>> Thanks for your help.
> >>>>>>
> >>>>>> Steve
> >>>>>> ______________________________________________________________________ 
> >>>>>>
> >>>>>> Steve Cousins, Ocean Modeling Group    Email: cousins@umit.maine.edu
> >>>>>> Marine Sciences, 452 Aubert Hall       http://rocky.umeoce.maine.edu
> >>>>>> Univ. of Maine, Orono, ME 04469        Phone: (207) 581-4302
> >>>>>>
> >>>>>>
> >>>>>>       
> >>>>>
> >>>
> >>>  
> >>>
> >>
> >>
> >
> >
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: swidth with mdadm and RAID6
  2006-09-18 20:44 ` Steve Cousins
@ 2006-09-18 21:06   ` Shailendra Tripathi
  2006-09-18 22:13   ` Shailendra Tripathi
  1 sibling, 0 replies; 19+ messages in thread
From: Shailendra Tripathi @ 2006-09-18 21:06 UTC (permalink / raw)
  To: cousins; +Cc: "xfs@oss.sgi.com" <xfs@oss.sgi.com>

-shailendra

>Since I have a spare in there do you think it is starting with md.nr_disks
>= 11 and then subtracting two?
>  
>
You can verify that very quickly by removing the spare_disks option and see it gives proper results.

>Thanks,
>
>Steve
>______________________________________________________________________
> Steve Cousins, Ocean Modeling Group    Email: cousins@umit.maine.edu
> Marine Sciences, 452 Aubert Hall       http://rocky.umeoce.maine.edu
> Univ. of Maine, Orono, ME 04469        Phone: (207) 581-4302
>
>On Mon, 18 Sep 2006, Shailendra Tripathi wrote:
>
>  
>
>>Hi Steve,
>>               Both of us are using old xfsprogs. It is handled in new 
>>xfsprogs. 
>>
>> */
>>		switch (md.level) {
>>		case 6:
>>			md.nr_disks--;
>>			/* fallthrough */
>>		case 5:
>>		case 4:
>>			md.nr_disks--;
>>			/* fallthrough */
>>		case 1:
>>		case 0:
>>		case 10:
>>			break;
>>		default:
>>			return 0;
>>
>>
>>Regards,
>>
>>Shailendra Tripathi wrote:
>>    
>>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: swidth with mdadm and RAID6
  2006-09-18 20:44 ` Steve Cousins
  2006-09-18 21:06   ` Shailendra Tripathi
@ 2006-09-18 22:13   ` Shailendra Tripathi
  2006-09-19  5:11     ` Timothy Shimmin
  1 sibling, 1 reply; 19+ messages in thread
From: Shailendra Tripathi @ 2006-09-18 22:13 UTC (permalink / raw)
  To: cousins; +Cc: "xfs@oss.sgi.com" <xfs@oss.sgi.com>

Hi Steve,
             Your guess appears to be correct. md_ioctl returns nr which 
is total number of disk in the array including the spare disks. However, 
XFS function md_get_vol_stripe does not take spare disk into account. It 
needs to subtract spare_disks as well.
     However, md.spare_disks returned by the call returns spare + parity 
(both). So, one way could be substract spare_disks directly. Otherwise, 
the xfs should rely on md.raid_disks. This does not include spare_disks 
and nr.disks should be changed for that.
    
When I run my program md_info on  raid5 array with 5 devices and 2 
spares, I get
[root@ga09 root]# ./a.out /dev/md11
Level 5, disks=7 spare_disks=3 raid_disks=5

Steve can you please compile the pasted program and run on your system 
with md prepared. It takes /dev/md<no> as input.
In your case, you should get above line as:
Level 6, disks=11 spare disks=3 raid_disks=10

       nr=working=active=failed=spare=0;
        ITERATE_RDEV(mddev,rdev,tmp) {
                nr++;
                if (rdev->faulty)
                        failed++;
                else {
                        working++;
                        if (rdev->in_sync)
                                active++;
                        else
                                spare++;
                }
        }

        info.level         = mddev->level;
        info.size          = mddev->size;
        info.nr_disks      = nr;
       ....
        info.active_disks  = active;
        info.working_disks = working;
        info.failed_disks  = failed;
        info.spare_disks   = spare;

-shailendra
The program is pasted below:
md_info.c. Takes /dev/md<no> as name. For example, /dev/md11.

#include<stdio.h>
#include<fcntl.h>
#include<sys/ioctl.h>
#ifndef MD_MAJOR
#define MD_MAJOR                9
#endif

#define GET_ARRAY_INFO          _IOR (MD_MAJOR, 0x11, struct md_array_info)


struct md_array_info {
__uint32_t major_version;
__uint32_t minor_version;
__uint32_t patch_version;
__uint32_t ctime;
__uint32_t level;
__uint32_t size;
__uint32_t nr_disks;
__uint32_t raid_disks;
__uint32_t md_minor;
__uint32_t not_persistent;
/*
* Generic state information
*/
__uint32_t utime;         /*  0 Superblock update time            */
__uint32_t state;         /*  1 State bits (clean, ...)           */
__uint32_t active_disks;  /*  2 Number of currently active disks  */
__uint32_t working_disks; /*  3 Number of working disks           */
__uint32_t failed_disks;  /*  4 Number of failed disks            */
__uint32_t spare_disks;   /*  5 Number of spare disks             */
/*
* Personality information
*/
__uint32_t layout;        /*  0 the array's physical layout       */
__uint32_t chunk_size;    /*  1 chunk size in bytes   */

};

int main(int argc, char *argv[])
{
        struct md_array_info    md;
        int                     fd;


        /* Open device */
        fd = open(argv[1], O_RDONLY);
        if (fd == -1) {
                printf("Could not open %s\n", argv[1]);
                exit(1);
        }
        if (ioctl(fd, GET_ARRAY_INFO, &md)) {
                printf("Error getting MD array info from %s\n", argv[1]);
                exit(1);
        }
        close(fd);
        printf("Level %d, disks=%d spare_disks=%d raid_disks=%d\n", 
md.level, md.nr_disks,
                md.spare_disks, md.raid_disks);
        return 0;
}

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: swidth with mdadm and RAID6
  2006-09-18 22:13   ` Shailendra Tripathi
@ 2006-09-19  5:11     ` Timothy Shimmin
  2006-09-19  6:44       ` Shailendra Tripathi
  0 siblings, 1 reply; 19+ messages in thread
From: Timothy Shimmin @ 2006-09-19  5:11 UTC (permalink / raw)
  To: Shailendra Tripathi
  Cc: cousins, "xfs@oss.sgi.com" <xfs@oss.sgi.com>

Hi Shailendra and Steve,

Shailendra Tripathi wrote:
> Hi Steve,
>             Your guess appears to be correct. md_ioctl returns nr which 
> is total number of disk in the array including the spare disks. However, 
> XFS function md_get_vol_stripe does not take spare disk into account. It 
> needs to subtract spare_disks as well.
>     However, md.spare_disks returned by the call returns spare + parity 
> (both). So, one way could be substract spare_disks directly. Otherwise, 
> the xfs should rely on md.raid_disks. This does not include spare_disks 
> and nr.disks should be changed for that.
>    When I run my program md_info on  raid5 array with 5 devices and 2 
> spares, I get
> [root@ga09 root]# ./a.out /dev/md11
> Level 5, disks=7 spare_disks=3 raid_disks=5
> 
> Steve can you please compile the pasted program and run on your system 
> with md prepared. It takes /dev/md<no> as input.
> In your case, you should get above line as:
> Level 6, disks=11 spare disks=3 raid_disks=10
> 
>       nr=working=active=failed=spare=0;
>        ITERATE_RDEV(mddev,rdev,tmp) {
>                nr++;
>                if (rdev->faulty)
>                        failed++;
>                else {
>                        working++;
>                        if (rdev->in_sync)
>                                active++;
>                        else
>                                spare++;
>                }
>        }
> 
>        info.level         = mddev->level;
>        info.size          = mddev->size;
>        info.nr_disks      = nr;
>       ....
>        info.active_disks  = active;
>        info.working_disks = working;
>        info.failed_disks  = failed;
>        info.spare_disks   = spare;
> 
> -shailendra

I'm not that au fait with RAID and md, but looking at what you wrote,
Shailendra, and the md code, instead of your suggestions
(what I think are your suggestions:) of:

(1) subtracting parity from md.raid_disk (instead of md.nr_disks)
     where we work out parity by switching on md.level
or
(2) using directly: (md.nr_disks - md.spares);

that instead we could use:
(3) using directly:  md.active_disks

i.e.
*swidth = *sunit * md.active_disks;
I presume that active is the working non spares and non-parity.

Does that make sense?

--Tim

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: swidth with mdadm and RAID6
  2006-09-19  5:11     ` Timothy Shimmin
@ 2006-09-19  6:44       ` Shailendra Tripathi
  2006-09-19  7:02         ` Timothy Shimmin
  0 siblings, 1 reply; 19+ messages in thread
From: Shailendra Tripathi @ 2006-09-19  6:44 UTC (permalink / raw)
  To: Timothy Shimmin
  Cc: cousins, "xfs@oss.sgi.com" <xfs@oss.sgi.com>


Hi Tim,

> I'm not that au fait with RAID and md, but looking at what you wrote,
> Shailendra, and the md code, instead of your suggestions
> (what I think are your suggestions:) of:
> 
> (1) subtracting parity from md.raid_disk (instead of md.nr_disks)
>     where we work out parity by switching on md.level
> or
> (2) using directly: (md.nr_disks - md.spares);
> 
> that instead we could use:
> (3) using directly:  md.active_disks
> 
> i.e.
> *swidth = *sunit * md.active_disks;
> I presume that active is the working non spares and non-parity.
> 
> Does that make sense?
       I agree with you that for operational raid since there would not 
be any faulty disks, active disks should the number of disks. However, I 
am just concerned that active disks tracks live disks (not failed 
disks). If we ever used these commands when the system has faulty drive, 
the information returned wouldn't be correct. Though, from XFS 
perspective, I can't think of where it can happen.
       I would still say that lets rely more on raid_disks to be more 
conservative, just my choice.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: swidth with mdadm and RAID6
  2006-09-19  6:44       ` Shailendra Tripathi
@ 2006-09-19  7:02         ` Timothy Shimmin
  0 siblings, 0 replies; 19+ messages in thread
From: Timothy Shimmin @ 2006-09-19  7:02 UTC (permalink / raw)
  To: Shailendra Tripathi
  Cc: cousins, "xfs@oss.sgi.com" <xfs@oss.sgi.com>

Shailendra Tripathi wrote:
> 
> Hi Tim,
> 
>> I'm not that au fait with RAID and md, but looking at what you wrote,
>> Shailendra, and the md code, instead of your suggestions
>> (what I think are your suggestions:) of:
>>
>> (1) subtracting parity from md.raid_disk (instead of md.nr_disks)
>>     where we work out parity by switching on md.level
>> or
>> (2) using directly: (md.nr_disks - md.spares);
>>
>> that instead we could use:
>> (3) using directly:  md.active_disks
>>
>> i.e.
>> *swidth = *sunit * md.active_disks;
>> I presume that active is the working non spares and non-parity.
>>
>> Does that make sense?
>       I agree with you that for operational raid since there would not 
> be any faulty disks, active disks should the number of disks. However, I 
> am just concerned that active disks tracks live disks (not failed 
> disks). If we ever used these commands when the system has faulty drive, 
> the information returned wouldn't be correct. Though, from XFS 
> perspective, I can't think of where it can happen.
>       I would still say that lets rely more on raid_disks to be more 
> conservative, just my choice.

I see your point.
I can just change md_get_subvol_stripe(): s/nr_disks/raid_disks/
I just liked the idea of removing the switch statement which could
potentially get out of date in the future. Too bad :)

--Tim

^ permalink raw reply	[flat|nested] 19+ messages in thread

[parent not found: <fc.004c4d192b2a17d13b9aca00b4f73745.2b2a26d7@umit.maine.edu>]

* Re: swidth with mdadm and RAID6
       [not found] <fc.004c4d192b2a17d13b9aca00b4f73745.2b2a26d7@umit.maine.edu>
@ 2006-09-18 15:33 ` Steve Cousins
  2006-09-18 18:10   ` Shailendra Tripathi
  0 siblings, 1 reply; 19+ messages in thread
From: Steve Cousins @ 2006-09-18 15:33 UTC (permalink / raw)
  To: Shailendra Tripathi; +Cc: xfs@oss.sgi.com

Hi Shailendra,

Here is the info:

1. [root@juno ~]# cat /proc/mdstat 
Personalities : [raid6] 
md0 : active raid6 sdb[0] sdl[10](S) sdk[9] sdj[8] sdi[7] sdh[6] sdg[5]
sdf[4] sde[3] sdd[2] sdc[1]
      3907091968 blocks level 6, 64k chunk, algorithm 2 [10/10]
[UUUUUUUUUU]
      
unused devices: <none>

2. mdadm --create /dev/md0 --chunk=64 --level=6 --raid-devices=10
--spare-devices=1 /dev/sd[bcdefghijkl]

3.  [root@juno ~]# xfs_db -r /dev/md*
xfs_db> sb
xfs_db> p
magicnum = 0x58465342
blocksize = 4096
dblocks = 976772992
rblocks = 0
rextents = 0
uuid = 04b32cce-ed38-496f-811f-2ccd51450bf4
logstart = 536870919
rootino = 256
rbmino = 257
rsumino = 258
rextsize = 144
agblocks = 30524160
agcount = 32
rbmblocks = 0
logblocks = 32768
versionnum = 0x3d84
sectsize = 4096
inodesize = 256
inopblock = 16
fname = "\000\000\000\000\000\000\000\000\000\000\000\000"
blocklog = 12
sectlog = 12
inodelog = 8
inopblog = 4
agblklog = 25
rextslog = 0
inprogress = 0
imax_pct = 25
icount = 36864
ifree = 362
fdblocks = 669630878
frextents = 0
uquotino = 0
gquotino = 0
qflags = 0
flags = 0
shared_vn = 0
inoalignmt = 2
unit = 16
width = 144
dirblklog = 0
logsectlog = 12
logsectsize = 4096
logsunit = 4096
features2 = 0
xfs_db> 


Thanks for the help.

Steve

______________________________________________________________________
 Steve Cousins, Ocean Modeling Group    Email: cousins@umit.maine.edu
 Marine Sciences, 452 Aubert Hall       http://rocky.umeoce.maine.edu
 Univ. of Maine, Orono, ME 04469        Phone: (207) 581-4302

On Mon, 18 Sep 2006, Shailendra Tripathi wrote:

> Can you list the output of
> 1. cat /proc/mdstat
> 2. the command to create 8+2 RAID6 with one spare ?
> 3. and output of following:
>     xfs_db -r /dev/md*
>     xfs_db> sb
>     xfs_db> p
> 
> -shailendra
> 
> Steve Cousins wrote:
> >> I have a RAID6 array of 11 500 GB drives using mdadm.  There is one
> >> hot-spare so the number of data drives is 8.  I used mkfs.xfs with
> >> defaults to create the file system and it seemed to pick up the chunk size
> >> I used correctly (64K) but I think it got the swidth wrong.  Here is what
> >> xfs_info says:
> >> 
> >> ===========================================================================
> >> meta-data=/dev/md0               isize=256    agcount=32, agsize=30524160
> >> blks
> >>          =                       sectsz=4096  attr=0
> >> data     =                       bsize=4096   blocks=976772992, imaxpct=25
> >>          =                       sunit=16     swidth=144 blks, unwritten=1
> >> naming   =version 2              bsize=4096
> >> log      =internal               bsize=4096   blocks=32768, version=2
> >>          =                       sectsz=4096  sunit=1 blks
> >> realtime =none                   extsz=589824 blocks=0, rtextents=0
> >> ===========================================================================
> >> 
> >> So, sunit*bsize=64K, but swidth=144 and swidth/sunit=9 so it looks like it
> >> thought there were 9 data drives instead of 8.  
> >> 
> >> Am I diagnosing this correctly?  Should I recreate the array and
> >> explicitly set sunit=16 and swidth=128?
> >> 
> >> Thanks for your help.
> >> 
> >> Steve
> >> ______________________________________________________________________
> >>  Steve Cousins, Ocean Modeling Group    Email: cousins@umit.maine.edu
> >>  Marine Sciences, 452 Aubert Hall       http://rocky.umeoce.maine.edu
> >>  Univ. of Maine, Orono, ME 04469        Phone: (207) 581-4302
> >> 
> >> 
> >
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: swidth with mdadm and RAID6
  2006-09-18 15:33 ` Steve Cousins
@ 2006-09-18 18:10   ` Shailendra Tripathi
  2006-09-18 18:19     ` Shailendra Tripathi
  0 siblings, 1 reply; 19+ messages in thread
From: Shailendra Tripathi @ 2006-09-18 18:10 UTC (permalink / raw)
  To: cousins; +Cc: xfs@oss.sgi.com

Hi Steve,
            I checked the code and it appears that XFS is not *aware* of 
RAID6. Basically, for all md devices, it gets the volume info by making 
a an ioctl call. I can see that XFS only take care of level 4 and level 
5. It does not account for level 6.
Only extra line need to be added here as below:

if (md.level == 6)
     md.nr_disks -= 2; /* RAID 6 has  2  parity disks */
 You can try with this change if you can. Do let mew know if it solves 
your problem.

This code is in function: md_get_subvol_stripe in <xf_progs>/libdisk/md.c


               /* Deduct a disk from stripe width on RAID4/5 */
                if (md.level == 4 || md.level == 5)
                        md.nr_disks--;

                /* Update sizes */
                *sunit = md.chunk_size >> 9;
                *swidth = *sunit * md.nr_disks;

                return 1;
        }

Regards,
Shailendra
Steve Cousins wrote:

>Hi Shailendra,
>
>Here is the info:
>
>1. [root@juno ~]# cat /proc/mdstat 
>Personalities : [raid6] 
>md0 : active raid6 sdb[0] sdl[10](S) sdk[9] sdj[8] sdi[7] sdh[6] sdg[5]
>sdf[4] sde[3] sdd[2] sdc[1]
>      3907091968 blocks level 6, 64k chunk, algorithm 2 [10/10]
>[UUUUUUUUUU]
>      
>unused devices: <none>
>
>2. mdadm --create /dev/md0 --chunk=64 --level=6 --raid-devices=10
>--spare-devices=1 /dev/sd[bcdefghijkl]
>
>3.  [root@juno ~]# xfs_db -r /dev/md*
>xfs_db> sb
>xfs_db> p
>magicnum = 0x58465342
>blocksize = 4096
>dblocks = 976772992
>rblocks = 0
>rextents = 0
>uuid = 04b32cce-ed38-496f-811f-2ccd51450bf4
>logstart = 536870919
>rootino = 256
>rbmino = 257
>rsumino = 258
>rextsize = 144
>agblocks = 30524160
>agcount = 32
>rbmblocks = 0
>logblocks = 32768
>versionnum = 0x3d84
>sectsize = 4096
>inodesize = 256
>inopblock = 16
>fname = "\000\000\000\000\000\000\000\000\000\000\000\000"
>blocklog = 12
>sectlog = 12
>inodelog = 8
>inopblog = 4
>agblklog = 25
>rextslog = 0
>inprogress = 0
>imax_pct = 25
>icount = 36864
>ifree = 362
>fdblocks = 669630878
>frextents = 0
>uquotino = 0
>gquotino = 0
>qflags = 0
>flags = 0
>shared_vn = 0
>inoalignmt = 2
>unit = 16
>width = 144
>dirblklog = 0
>logsectlog = 12
>logsectsize = 4096
>logsunit = 4096
>features2 = 0
>xfs_db> 
>
>
>Thanks for the help.
>
>Steve
>
>______________________________________________________________________
> Steve Cousins, Ocean Modeling Group    Email: cousins@umit.maine.edu
> Marine Sciences, 452 Aubert Hall       http://rocky.umeoce.maine.edu
> Univ. of Maine, Orono, ME 04469        Phone: (207) 581-4302
>
>On Mon, 18 Sep 2006, Shailendra Tripathi wrote:
>
>  
>
>>Can you list the output of
>>1. cat /proc/mdstat
>>2. the command to create 8+2 RAID6 with one spare ?
>>3. and output of following:
>>    xfs_db -r /dev/md*
>>    xfs_db> sb
>>    xfs_db> p
>>
>>-shailendra
>>
>>Steve Cousins wrote:
>>    
>>
>>>>I have a RAID6 array of 11 500 GB drives using mdadm.  There is one
>>>>hot-spare so the number of data drives is 8.  I used mkfs.xfs with
>>>>defaults to create the file system and it seemed to pick up the chunk size
>>>>I used correctly (64K) but I think it got the swidth wrong.  Here is what
>>>>xfs_info says:
>>>>
>>>>===========================================================================
>>>>meta-data=/dev/md0               isize=256    agcount=32, agsize=30524160
>>>>blks
>>>>         =                       sectsz=4096  attr=0
>>>>data     =                       bsize=4096   blocks=976772992, imaxpct=25
>>>>         =                       sunit=16     swidth=144 blks, unwritten=1
>>>>naming   =version 2              bsize=4096
>>>>log      =internal               bsize=4096   blocks=32768, version=2
>>>>         =                       sectsz=4096  sunit=1 blks
>>>>realtime =none                   extsz=589824 blocks=0, rtextents=0
>>>>===========================================================================
>>>>
>>>>So, sunit*bsize=64K, but swidth=144 and swidth/sunit=9 so it looks like it
>>>>thought there were 9 data drives instead of 8.  
>>>>
>>>>Am I diagnosing this correctly?  Should I recreate the array and
>>>>explicitly set sunit=16 and swidth=128?
>>>>
>>>>Thanks for your help.
>>>>
>>>>Steve
>>>>______________________________________________________________________
>>>> Steve Cousins, Ocean Modeling Group    Email: cousins@umit.maine.edu
>>>> Marine Sciences, 452 Aubert Hall       http://rocky.umeoce.maine.edu
>>>> Univ. of Maine, Orono, ME 04469        Phone: (207) 581-4302
>>>>
>>>>
>>>>        
>>>>
>
>  
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: swidth with mdadm and RAID6
  2006-09-18 18:10   ` Shailendra Tripathi
@ 2006-09-18 18:19     ` Shailendra Tripathi
  0 siblings, 0 replies; 19+ messages in thread
From: Shailendra Tripathi @ 2006-09-18 18:19 UTC (permalink / raw)
  To: Shailendra Tripathi; +Cc: cousins, xfs@oss.sgi.com

Hi Steve,
               Both of us are using old xfsprogs. It is handled in new 
xfsprogs. 

 */
		switch (md.level) {
		case 6:
			md.nr_disks--;
			/* fallthrough */
		case 5:
		case 4:
			md.nr_disks--;
			/* fallthrough */
		case 1:
		case 0:
		case 10:
			break;
		default:
			return 0;


Regards,

Shailendra Tripathi wrote:

> Hi Steve,
>            I checked the code and it appears that XFS is not *aware* 
> of RAID6. Basically, for all md devices, it gets the volume info by 
> making a an ioctl call. I can see that XFS only take care of level 4 
> and level 5. It does not account for level 6.
> Only extra line need to be added here as below:
>
> if (md.level == 6)
>     md.nr_disks -= 2; /* RAID 6 has  2  parity disks */
> You can try with this change if you can. Do let mew know if it solves 
> your problem.
>
> This code is in function: md_get_subvol_stripe in <xf_progs>/libdisk/md.c
>
>
>               /* Deduct a disk from stripe width on RAID4/5 */
>                if (md.level == 4 || md.level == 5)
>                        md.nr_disks--;
>
>                /* Update sizes */
>                *sunit = md.chunk_size >> 9;
>                *swidth = *sunit * md.nr_disks;
>
>                return 1;
>        }
>
> Regards,
> Shailendra
> Steve Cousins wrote:
>
>> Hi Shailendra,
>>
>> Here is the info:
>>
>> 1. [root@juno ~]# cat /proc/mdstat Personalities : [raid6] md0 : 
>> active raid6 sdb[0] sdl[10](S) sdk[9] sdj[8] sdi[7] sdh[6] sdg[5]
>> sdf[4] sde[3] sdd[2] sdc[1]
>>      3907091968 blocks level 6, 64k chunk, algorithm 2 [10/10]
>> [UUUUUUUUUU]
>>      unused devices: <none>
>>
>> 2. mdadm --create /dev/md0 --chunk=64 --level=6 --raid-devices=10
>> --spare-devices=1 /dev/sd[bcdefghijkl]
>>
>> 3.  [root@juno ~]# xfs_db -r /dev/md*
>> xfs_db> sb
>> xfs_db> p
>> magicnum = 0x58465342
>> blocksize = 4096
>> dblocks = 976772992
>> rblocks = 0
>> rextents = 0
>> uuid = 04b32cce-ed38-496f-811f-2ccd51450bf4
>> logstart = 536870919
>> rootino = 256
>> rbmino = 257
>> rsumino = 258
>> rextsize = 144
>> agblocks = 30524160
>> agcount = 32
>> rbmblocks = 0
>> logblocks = 32768
>> versionnum = 0x3d84
>> sectsize = 4096
>> inodesize = 256
>> inopblock = 16
>> fname = "\000\000\000\000\000\000\000\000\000\000\000\000"
>> blocklog = 12
>> sectlog = 12
>> inodelog = 8
>> inopblog = 4
>> agblklog = 25
>> rextslog = 0
>> inprogress = 0
>> imax_pct = 25
>> icount = 36864
>> ifree = 362
>> fdblocks = 669630878
>> frextents = 0
>> uquotino = 0
>> gquotino = 0
>> qflags = 0
>> flags = 0
>> shared_vn = 0
>> inoalignmt = 2
>> unit = 16
>> width = 144
>> dirblklog = 0
>> logsectlog = 12
>> logsectsize = 4096
>> logsunit = 4096
>> features2 = 0
>> xfs_db>
>>
>> Thanks for the help.
>>
>> Steve
>>
>> ______________________________________________________________________
>> Steve Cousins, Ocean Modeling Group    Email: cousins@umit.maine.edu
>> Marine Sciences, 452 Aubert Hall       http://rocky.umeoce.maine.edu
>> Univ. of Maine, Orono, ME 04469        Phone: (207) 581-4302
>>
>> On Mon, 18 Sep 2006, Shailendra Tripathi wrote:
>>
>>  
>>
>>> Can you list the output of
>>> 1. cat /proc/mdstat
>>> 2. the command to create 8+2 RAID6 with one spare ?
>>> 3. and output of following:
>>>    xfs_db -r /dev/md*
>>>    xfs_db> sb
>>>    xfs_db> p
>>>
>>> -shailendra
>>>
>>> Steve Cousins wrote:
>>>   
>>>
>>>>> I have a RAID6 array of 11 500 GB drives using mdadm.  There is one
>>>>> hot-spare so the number of data drives is 8.  I used mkfs.xfs with
>>>>> defaults to create the file system and it seemed to pick up the 
>>>>> chunk size
>>>>> I used correctly (64K) but I think it got the swidth wrong.  Here 
>>>>> is what
>>>>> xfs_info says:
>>>>>
>>>>> =========================================================================== 
>>>>>
>>>>> meta-data=/dev/md0               isize=256    agcount=32, 
>>>>> agsize=30524160
>>>>> blks
>>>>>         =                       sectsz=4096  attr=0
>>>>> data     =                       bsize=4096   blocks=976772992, 
>>>>> imaxpct=25
>>>>>         =                       sunit=16     swidth=144 blks, 
>>>>> unwritten=1
>>>>> naming   =version 2              bsize=4096
>>>>> log      =internal               bsize=4096   blocks=32768, version=2
>>>>>         =                       sectsz=4096  sunit=1 blks
>>>>> realtime =none                   extsz=589824 blocks=0, rtextents=0
>>>>> =========================================================================== 
>>>>>
>>>>>
>>>>> So, sunit*bsize=64K, but swidth=144 and swidth/sunit=9 so it looks 
>>>>> like it
>>>>> thought there were 9 data drives instead of 8. 
>>>>> Am I diagnosing this correctly?  Should I recreate the array and
>>>>> explicitly set sunit=16 and swidth=128?
>>>>>
>>>>> Thanks for your help.
>>>>>
>>>>> Steve
>>>>> ______________________________________________________________________ 
>>>>>
>>>>> Steve Cousins, Ocean Modeling Group    Email: cousins@umit.maine.edu
>>>>> Marine Sciences, 452 Aubert Hall       http://rocky.umeoce.maine.edu
>>>>> Univ. of Maine, Orono, ME 04469        Phone: (207) 581-4302
>>>>>
>>>>>
>>>>>       
>>>>
>>
>>  
>>
>
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* swidth with mdadm and RAID6
@ 2006-09-15 21:07 Steve Cousins
  2006-09-15 23:49 ` Peter Grandi
  2006-09-18 14:50 ` Shailendra Tripathi
  0 siblings, 2 replies; 19+ messages in thread
From: Steve Cousins @ 2006-09-15 21:07 UTC (permalink / raw)
  To: xfs

I have a RAID6 array of 11 500 GB drives using mdadm.  There is one
hot-spare so the number of data drives is 8.  I used mkfs.xfs with
defaults to create the file system and it seemed to pick up the chunk size
I used correctly (64K) but I think it got the swidth wrong.  Here is what
xfs_info says:

===========================================================================
meta-data=/dev/md0               isize=256    agcount=32, agsize=30524160
blks
         =                       sectsz=4096  attr=0
data     =                       bsize=4096   blocks=976772992, imaxpct=25
         =                       sunit=16     swidth=144 blks, unwritten=1
naming   =version 2              bsize=4096
log      =internal               bsize=4096   blocks=32768, version=2
         =                       sectsz=4096  sunit=1 blks
realtime =none                   extsz=589824 blocks=0, rtextents=0
===========================================================================

So, sunit*bsize=64K, but swidth=144 and swidth/sunit=9 so it looks like it
thought there were 9 data drives instead of 8.  

Am I diagnosing this correctly?  Should I recreate the array and
explicitly set sunit=16 and swidth=128?

Thanks for your help.

Steve
______________________________________________________________________
 Steve Cousins, Ocean Modeling Group    Email: cousins@umit.maine.edu
 Marine Sciences, 452 Aubert Hall       http://rocky.umeoce.maine.edu
 Univ. of Maine, Orono, ME 04469        Phone: (207) 581-4302

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: swidth with mdadm and RAID6
  2006-09-15 21:07 Steve Cousins
@ 2006-09-15 23:49 ` Peter Grandi
  2006-09-18 14:50 ` Shailendra Tripathi
  1 sibling, 0 replies; 19+ messages in thread
From: Peter Grandi @ 2006-09-15 23:49 UTC (permalink / raw)
  To: Linux XFS

>>> On Fri, 15 Sep 2006 17:07:07 -0400 (EDT), Steve Cousins
>>> <cousins@limpet.umeoce.maine.edu> said:

cousins> I have a RAID6 array of 11 500 GB drives using mdadm.
cousins> There is one hot-spare so the number of data drives is
cousins> 8.  I used mkfs.xfs with defaults to create the file
cousins> system and it seemed to pick up the chunk size I used
cousins> correctly (64K) but I think it got the swidth wrong.

Worrying about the impact on performance of a relatively small
thing like 'swidth' for something like an 8+2 RAID6 is quite
funny. http://WWW.BAARF.com/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: swidth with mdadm and RAID6
  2006-09-15 21:07 Steve Cousins
  2006-09-15 23:49 ` Peter Grandi
@ 2006-09-18 14:50 ` Shailendra Tripathi
  1 sibling, 0 replies; 19+ messages in thread
From: Shailendra Tripathi @ 2006-09-18 14:50 UTC (permalink / raw)
  To: cousins; +Cc: xfs

Can you list the output of
1. cat /proc/mdstat
2. the command to create 8+2 RAID6 with one spare ?
3. and output of following:
    xfs_db -r /dev/md*
    xfs_db> sb
    xfs_db> p

-shailendra

Steve Cousins wrote:
> I have a RAID6 array of 11 500 GB drives using mdadm.  There is one
> hot-spare so the number of data drives is 8.  I used mkfs.xfs with
> defaults to create the file system and it seemed to pick up the chunk size
> I used correctly (64K) but I think it got the swidth wrong.  Here is what
> xfs_info says:
> 
> ===========================================================================
> meta-data=/dev/md0               isize=256    agcount=32, agsize=30524160
> blks
>          =                       sectsz=4096  attr=0
> data     =                       bsize=4096   blocks=976772992, imaxpct=25
>          =                       sunit=16     swidth=144 blks, unwritten=1
> naming   =version 2              bsize=4096
> log      =internal               bsize=4096   blocks=32768, version=2
>          =                       sectsz=4096  sunit=1 blks
> realtime =none                   extsz=589824 blocks=0, rtextents=0
> ===========================================================================
> 
> So, sunit*bsize=64K, but swidth=144 and swidth/sunit=9 so it looks like it
> thought there were 9 data drives instead of 8.  
> 
> Am I diagnosing this correctly?  Should I recreate the array and
> explicitly set sunit=16 and swidth=128?
> 
> Thanks for your help.
> 
> Steve
> ______________________________________________________________________
>  Steve Cousins, Ocean Modeling Group    Email: cousins@umit.maine.edu
>  Marine Sciences, 452 Aubert Hall       http://rocky.umeoce.maine.edu
>  Univ. of Maine, Orono, ME 04469        Phone: (207) 581-4302
> 
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2006-09-19 20:20 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <fc.004c4d192b2da8e03b9aca0078918430.2b2da8e5@umit.maine.edu>
2006-09-19 16:36 ` swidth with mdadm and RAID6 Steve Cousins
2006-09-19 16:58   ` Shailendra Tripathi
2006-09-19 17:13   ` Steve Cousins
     [not found] <fc.004c4d192b3470d73b9aca0029fcf469.2b349301@umit.maine.edu>
2006-09-19 17:52 ` Steve Cousins
2006-09-19 19:22   ` Steve Cousins
2006-09-19 20:19     ` Shailendra Tripathi
     [not found] <fc.004c4d192b2c45a93b9aca00fc3f0f38.2b2c4b4d@umit.maine.edu>
2006-09-18 20:28 ` Steve Cousins
2006-09-18 20:44 ` Steve Cousins
2006-09-18 21:06   ` Shailendra Tripathi
2006-09-18 22:13   ` Shailendra Tripathi
2006-09-19  5:11     ` Timothy Shimmin
2006-09-19  6:44       ` Shailendra Tripathi
2006-09-19  7:02         ` Timothy Shimmin
     [not found] <fc.004c4d192b2a17d13b9aca00b4f73745.2b2a26d7@umit.maine.edu>
2006-09-18 15:33 ` Steve Cousins
2006-09-18 18:10   ` Shailendra Tripathi
2006-09-18 18:19     ` Shailendra Tripathi
2006-09-15 21:07 Steve Cousins
2006-09-15 23:49 ` Peter Grandi
2006-09-18 14:50 ` Shailendra Tripathi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox