From mboxrd@z Thu Jan  1 00:00:00 1970
From: Andrew Price <anprice@redhat.com>
Date: Thu, 06 Jun 2013 14:04:02 +0100
Subject: [Cluster-devel] [PATCH 2/4] mkfs.gfs2: Align resource groups to
 RAID stripes
In-Reply-To: <83456593.48037538.1370523424334.JavaMail.root@redhat.com>
References: <1370520213-29676-1-git-send-email-anprice@redhat.com>
	<1370520213-29676-2-git-send-email-anprice@redhat.com>
	<83456593.48037538.1370523424334.JavaMail.root@redhat.com>
Message-ID: <51B088C2.4080701@redhat.com>
List-Id: <cluster-devel.redhat.com>
To: cluster-devel.redhat.com
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit

On 06/06/13 13:57, Bob Peterson wrote:
> Hi,
>
> | +			/* Squeeze the last 1 or 2 rgs into the remaining space */
> | +			if ((nextaddr < sdp->device.length) && (sdp->device.length - nextaddr >=
> | minrgsz)) {
> | +				rglen = sdp->device.length - nextaddr;
> | +			} else {
> | +				if (sdp->device.length - rgaddr <= maxrgsz)
> | +					rgt->length = sdp->device.length - rgaddr;
> | +				else
> | +					rgt->length = maxrgsz;
> | +				/* This is the last rg */
> | +				nextaddr = 0;
>
> In GFS1, we allowed mix-and-match resource group sizes, but we originally
> designed mkfs.gfs2 to ensure that all rgrps were the same uniform size. This
> usually means some space is wasted at the end of the last resource group.
>
> We did this primarily so that fsck.gfs2 could more easily detect and repair
> damaged resource groups and rindex values. At the time it was designed, I got
> the buy-in of a bunch of developers and we all agreed to it. Since that time,
> I've had to change fsck.gfs2 to take more drastic measures to repair damaged
> resource groups, due to the fact that gfs2_convert can convert a GFS1 file
> system to GFS2, and thus, we can still end up with non-uniform resource groups.
> Many customers were adding storage and doing multiple gfs_grow ops,
> which resulted in metadata sets where the rgrps and rindex were complete chaos.
>
> Still, my assumption has always been: If the file system was made by
> mkfs.gfs2, all resource groups (but the first one) are identical in size.
>
> I think gfs2_grow takes some steps to ensure that new rgrps are also created
> using the same size as the current resource groups. If we don't enforce
> that rule, the rindex could once again become chaos, which means our chances
> of rgrp and rindex repair get worse.
>
> Do we still want to enforce this rule?

Good question. I had assumed that we don't have a rule like that as the 
rindex specifies the rg sizes. My next planned mkfs change is to allow 
the journal creation code to ask for a resource group large enough to 
contain all of a journal's data blocks so that they're always a single 
extent. Returning to enforcing the rule would have implications for that 
plan, too.

Andy

> With the improved rgrp repair algorithms in fsck.gfs2, it may not be
> necessary anymore. I'm not trying to be dogmatic; I'm looking for opinions here.
>
> Regards,
>
> Bob Peterson
> Red Hat File Systems
>