cluster-devel.redhat.com archive mirror
 help / color / mirror / Atom feed
From: Andrew Price <anprice@redhat.com>
To: cluster-devel.redhat.com
Subject: [Cluster-devel] [RFC] gfs2: Add sb and rgrp fields to aid fsck and grow
Date: Thu, 30 Jan 2014 17:31:22 +0000	[thread overview]
Message-ID: <52EA8C6A.3080600@redhat.com> (raw)
In-Reply-To: <1391009563.2729.23.camel@menhir>

On 29/01/14 15:32, Steven Whitehouse wrote:
> Hi,
>
> On Wed, 2014-01-29 at 14:47 +0000, Andrew Price wrote:
>> This adds some fields to the superblock and resource group header
>> structures that we can use in rg size and address discovery in gfs2_grow
>> and fsck.gfs2. They are not intended to be changed after mkfs time.
>>
>> sb_rgsize is the base resource group size used by mkfs.gfs2, before any
>> adjustment or alignment. It is required in order to extend the fs with
>> the correct resource group size in gfs2_grow and can also be used by
>> fsck.gfs2 when rebuilding broken resource groups.
>>
> I still don't see the point of adding this, really. We can calculate a
> sensible size and use that for extending the rgrps.

I'm not really sure what you mean by a sensible size. Ideally we should 
be able to know or predict the actual rgrp sizes, in order to reuse the 
code which builds resource groups in mkfs in other rgrp appending, 
discovery and fixing code. Having the original value at our disposal 
would allow us to do that and take some guesswork out of fsck.gfs2. I 
think users would appreciate the consistency between the arguments they 
gave to mkfs.gfs2 and the values gfs2_grow uses, too.

> It might be worth
> considering a suitable interface to ask the kernel where the existing
> rgrps are (all of them!) from userland while the fs is mounted though.
> If the fs is not mounted, then the information can be easily gathered by
> looking at the existing rgrp layout.

That might be fine for gfs2_grow's purposes, but if the fs has a 
corrupted rindex then it will still be difficult for fsck.gfs2 to get 
the information reliably.

>> rg_next is the address of the next resource group and is set by
>> mkfs.gfs2. It is intended to be used as a hint to fsck.gfs2 and can be
>> used by other tools which need to read the resource groups sequentially.
>>
> It needs to be set elsewhere too - there is no reason that we cannot
> upgrade older fs by adding this info each time we write an rgrp header
> that does not already have this info in it.

Yes, that makes sense, it'd be set by gfs2_convert also.

> Also, we could use the 32
> bit field rather than a 64 bit one, since the max size of the rgrp is 32
> bits I think? Or is there some corner case that we need to take care of
> perhaps?

Well I had intended it would be an absolute fs block address but if we 
use an offset then we have to keep in mind that there will be an 
alignment gap after the end of a rgrp in many cases. I think we'd have 
to find a storage array with pretty gigantic stripes to exhaust that 
address space though.

>> rg_uuid is intended to be the same as sb_uuid for the file system. It
>> can be used by fsck.gfs2, when searching for resource group headers, in
>> order to distinguish resource groups created as part of a previous file
>> system on the device from resource groups in the current file system.
>>
> Again, this could be updated by writing the rgrps back to deal with
> older filesystems which need to be upgraded.

Yes.

> That could be done as a one
> off sweep, or as and when we write each rgrp. Also I wonder - if the
> field is zero, we know that the rgrp is an old one that doesn't have it
> set, but if someone changes the uuid at a later date, then what?

That's a good point.

> Maybe
> we can use the uuid as a way to set it to start with (in mkfs), but
> after that we'd use the value from the first rgrp to fill in later
> rgrps. If the first rgrp was zero then we'd not update the other rgrps
> until the first rgrp had a value in it. Or something like that... I just
> want to be certain that we understand what this field will mean in all
> possible cases,

Yes, unless you'd prefer a separate sb_rg_uuid field in the superblock 
we should treat it differently to the sb_uuid after mkfs and only expect 
the rg_uuids to be consistent with themselves. There's a corner case 
where the first rgrp might have its uuid while the others still have 
zero, though, which will need some more though.

Andy

>
> Steve.
>
>> Signed-off-by: Andrew Price <anprice@redhat.com>
>> ---
>>   include/uapi/linux/gfs2_ondisk.h | 8 ++++++--
>>   1 file changed, 6 insertions(+), 2 deletions(-)
>>
>> diff --git a/include/uapi/linux/gfs2_ondisk.h b/include/uapi/linux/gfs2_ondisk.h
>> index 0f24c07..f1489cb 100644
>> --- a/include/uapi/linux/gfs2_ondisk.h
>> +++ b/include/uapi/linux/gfs2_ondisk.h
>> @@ -118,7 +118,8 @@ struct gfs2_sb {
>>
>>   	__be32 sb_bsize;
>>   	__be32 sb_bsize_shift;
>> -	__u32 __pad1;	/* Was journal segment size in gfs1 */
>> +	__be32 sb_rgsize; /* Resource group size used on fs creation.
>> +	                     Was journal segment size in gfs1 */
>>
>>   	struct gfs2_inum sb_master_dir; /* Was jindex dinode in gfs1 */
>>   	struct gfs2_inum __pad2; /* Was rindex dinode in gfs1 */
>> @@ -131,6 +132,7 @@ struct gfs2_sb {
>>   	struct gfs2_inum __pad4; /* Was licence inode in gfs1 */
>>   #define GFS2_HAS_UUID 1
>>   	__u8 sb_uuid[16]; /* The UUID, maybe 0 for backwards compat */
>> +
>>   };
>>
>>   /*
>> @@ -188,8 +190,10 @@ struct gfs2_rgrp {
>>   	__be32 rg_dinodes;
>>   	__be32 __pad;
>>   	__be64 rg_igeneration;
>> +	__be64 rg_next; /* Address of the next resource group */
>> +	__u8 rg_uuid[16]; /* The UUID, maybe 0 for backwards compat */
>>
>> -	__u8 rg_reserved[80]; /* Several fields from gfs1 now reserved */
>> +	__u8 rg_reserved[64]; /* Several fields from gfs1 now reserved */
>>   };
>>
>>   /*
>
>



  reply	other threads:[~2014-01-30 17:31 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-01-29 14:47 [Cluster-devel] [RFC] gfs2: Add sb and rgrp fields to aid fsck and grow Andrew Price
2014-01-29 14:54 ` Andrew Price
2014-01-29 15:32 ` Steven Whitehouse
2014-01-30 17:31   ` Andrew Price [this message]
     [not found]     ` <1391107830.2725.26.camel@menhir>
2014-04-07 15:50       ` Andrew Price
2014-04-09 12:29         ` Steven Whitehouse

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=52EA8C6A.3080600@redhat.com \
    --to=anprice@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).