* [Cluster-devel] [RFC] gfs2: Add sb and rgrp fields to aid fsck and grow @ 2014-01-29 14:47 Andrew Price 2014-01-29 14:54 ` Andrew Price 2014-01-29 15:32 ` Steven Whitehouse 0 siblings, 2 replies; 6+ messages in thread From: Andrew Price @ 2014-01-29 14:47 UTC (permalink / raw) To: cluster-devel.redhat.com This adds some fields to the superblock and resource group header structures that we can use in rg size and address discovery in gfs2_grow and fsck.gfs2. They are not intended to be changed after mkfs time. sb_rgsize is the base resource group size used by mkfs.gfs2, before any adjustment or alignment. It is required in order to extend the fs with the correct resource group size in gfs2_grow and can also be used by fsck.gfs2 when rebuilding broken resource groups. rg_next is the address of the next resource group and is set by mkfs.gfs2. It is intended to be used as a hint to fsck.gfs2 and can be used by other tools which need to read the resource groups sequentially. rg_uuid is intended to be the same as sb_uuid for the file system. It can be used by fsck.gfs2, when searching for resource group headers, in order to distinguish resource groups created as part of a previous file system on the device from resource groups in the current file system. Signed-off-by: Andrew Price <anprice@redhat.com> --- include/uapi/linux/gfs2_ondisk.h | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/include/uapi/linux/gfs2_ondisk.h b/include/uapi/linux/gfs2_ondisk.h index 0f24c07..f1489cb 100644 --- a/include/uapi/linux/gfs2_ondisk.h +++ b/include/uapi/linux/gfs2_ondisk.h @@ -118,7 +118,8 @@ struct gfs2_sb { __be32 sb_bsize; __be32 sb_bsize_shift; - __u32 __pad1; /* Was journal segment size in gfs1 */ + __be32 sb_rgsize; /* Resource group size used on fs creation. + Was journal segment size in gfs1 */ struct gfs2_inum sb_master_dir; /* Was jindex dinode in gfs1 */ struct gfs2_inum __pad2; /* Was rindex dinode in gfs1 */ @@ -131,6 +132,7 @@ struct gfs2_sb { struct gfs2_inum __pad4; /* Was licence inode in gfs1 */ #define GFS2_HAS_UUID 1 __u8 sb_uuid[16]; /* The UUID, maybe 0 for backwards compat */ + }; /* @@ -188,8 +190,10 @@ struct gfs2_rgrp { __be32 rg_dinodes; __be32 __pad; __be64 rg_igeneration; + __be64 rg_next; /* Address of the next resource group */ + __u8 rg_uuid[16]; /* The UUID, maybe 0 for backwards compat */ - __u8 rg_reserved[80]; /* Several fields from gfs1 now reserved */ + __u8 rg_reserved[64]; /* Several fields from gfs1 now reserved */ }; /* -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* [Cluster-devel] [RFC] gfs2: Add sb and rgrp fields to aid fsck and grow 2014-01-29 14:47 [Cluster-devel] [RFC] gfs2: Add sb and rgrp fields to aid fsck and grow Andrew Price @ 2014-01-29 14:54 ` Andrew Price 2014-01-29 15:32 ` Steven Whitehouse 1 sibling, 0 replies; 6+ messages in thread From: Andrew Price @ 2014-01-29 14:54 UTC (permalink / raw) To: cluster-devel.redhat.com On 29/01/14 14:47, Andrew Price wrote: > This adds some fields to the superblock and resource group header > structures that we can use in rg size and address discovery in gfs2_grow > and fsck.gfs2. They are not intended to be changed after mkfs time. > > sb_rgsize is the base resource group size used by mkfs.gfs2, before any > adjustment or alignment. It is required in order to extend the fs with > the correct resource group size in gfs2_grow and can also be used by > fsck.gfs2 when rebuilding broken resource groups. > > rg_next is the address of the next resource group and is set by > mkfs.gfs2. It is intended to be used as a hint to fsck.gfs2 and can be > used by other tools which need to read the resource groups sequentially. > > rg_uuid is intended to be the same as sb_uuid for the file system. It > can be used by fsck.gfs2, when searching for resource group headers, in > order to distinguish resource groups created as part of a previous file > system on the device from resource groups in the current file system. > > Signed-off-by: Andrew Price <anprice@redhat.com> > --- > include/uapi/linux/gfs2_ondisk.h | 8 ++++++-- > 1 file changed, 6 insertions(+), 2 deletions(-) > > diff --git a/include/uapi/linux/gfs2_ondisk.h b/include/uapi/linux/gfs2_ondisk.h > index 0f24c07..f1489cb 100644 > --- a/include/uapi/linux/gfs2_ondisk.h > +++ b/include/uapi/linux/gfs2_ondisk.h > @@ -118,7 +118,8 @@ struct gfs2_sb { > > __be32 sb_bsize; > __be32 sb_bsize_shift; > - __u32 __pad1; /* Was journal segment size in gfs1 */ > + __be32 sb_rgsize; /* Resource group size used on fs creation. > + Was journal segment size in gfs1 */ > > struct gfs2_inum sb_master_dir; /* Was jindex dinode in gfs1 */ > struct gfs2_inum __pad2; /* Was rindex dinode in gfs1 */ > @@ -131,6 +132,7 @@ struct gfs2_sb { > struct gfs2_inum __pad4; /* Was licence inode in gfs1 */ > #define GFS2_HAS_UUID 1 > __u8 sb_uuid[16]; /* The UUID, maybe 0 for backwards compat */ > + Please ignore :) > }; > > /* > @@ -188,8 +190,10 @@ struct gfs2_rgrp { > __be32 rg_dinodes; > __be32 __pad; > __be64 rg_igeneration; > + __be64 rg_next; /* Address of the next resource group */ > + __u8 rg_uuid[16]; /* The UUID, maybe 0 for backwards compat */ > > - __u8 rg_reserved[80]; /* Several fields from gfs1 now reserved */ > + __u8 rg_reserved[64]; /* Several fields from gfs1 now reserved */ Oops, this should have been __u8 rg_reserved[56]; > }; > > /* > ^ permalink raw reply [flat|nested] 6+ messages in thread
* [Cluster-devel] [RFC] gfs2: Add sb and rgrp fields to aid fsck and grow 2014-01-29 14:47 [Cluster-devel] [RFC] gfs2: Add sb and rgrp fields to aid fsck and grow Andrew Price 2014-01-29 14:54 ` Andrew Price @ 2014-01-29 15:32 ` Steven Whitehouse 2014-01-30 17:31 ` Andrew Price 1 sibling, 1 reply; 6+ messages in thread From: Steven Whitehouse @ 2014-01-29 15:32 UTC (permalink / raw) To: cluster-devel.redhat.com Hi, On Wed, 2014-01-29 at 14:47 +0000, Andrew Price wrote: > This adds some fields to the superblock and resource group header > structures that we can use in rg size and address discovery in gfs2_grow > and fsck.gfs2. They are not intended to be changed after mkfs time. > > sb_rgsize is the base resource group size used by mkfs.gfs2, before any > adjustment or alignment. It is required in order to extend the fs with > the correct resource group size in gfs2_grow and can also be used by > fsck.gfs2 when rebuilding broken resource groups. > I still don't see the point of adding this, really. We can calculate a sensible size and use that for extending the rgrps. It might be worth considering a suitable interface to ask the kernel where the existing rgrps are (all of them!) from userland while the fs is mounted though. If the fs is not mounted, then the information can be easily gathered by looking at the existing rgrp layout. > rg_next is the address of the next resource group and is set by > mkfs.gfs2. It is intended to be used as a hint to fsck.gfs2 and can be > used by other tools which need to read the resource groups sequentially. > It needs to be set elsewhere too - there is no reason that we cannot upgrade older fs by adding this info each time we write an rgrp header that does not already have this info in it. Also, we could use the 32 bit field rather than a 64 bit one, since the max size of the rgrp is 32 bits I think? Or is there some corner case that we need to take care of perhaps? > rg_uuid is intended to be the same as sb_uuid for the file system. It > can be used by fsck.gfs2, when searching for resource group headers, in > order to distinguish resource groups created as part of a previous file > system on the device from resource groups in the current file system. > Again, this could be updated by writing the rgrps back to deal with older filesystems which need to be upgraded. That could be done as a one off sweep, or as and when we write each rgrp. Also I wonder - if the field is zero, we know that the rgrp is an old one that doesn't have it set, but if someone changes the uuid at a later date, then what? Maybe we can use the uuid as a way to set it to start with (in mkfs), but after that we'd use the value from the first rgrp to fill in later rgrps. If the first rgrp was zero then we'd not update the other rgrps until the first rgrp had a value in it. Or something like that... I just want to be certain that we understand what this field will mean in all possible cases, Steve. > Signed-off-by: Andrew Price <anprice@redhat.com> > --- > include/uapi/linux/gfs2_ondisk.h | 8 ++++++-- > 1 file changed, 6 insertions(+), 2 deletions(-) > > diff --git a/include/uapi/linux/gfs2_ondisk.h b/include/uapi/linux/gfs2_ondisk.h > index 0f24c07..f1489cb 100644 > --- a/include/uapi/linux/gfs2_ondisk.h > +++ b/include/uapi/linux/gfs2_ondisk.h > @@ -118,7 +118,8 @@ struct gfs2_sb { > > __be32 sb_bsize; > __be32 sb_bsize_shift; > - __u32 __pad1; /* Was journal segment size in gfs1 */ > + __be32 sb_rgsize; /* Resource group size used on fs creation. > + Was journal segment size in gfs1 */ > > struct gfs2_inum sb_master_dir; /* Was jindex dinode in gfs1 */ > struct gfs2_inum __pad2; /* Was rindex dinode in gfs1 */ > @@ -131,6 +132,7 @@ struct gfs2_sb { > struct gfs2_inum __pad4; /* Was licence inode in gfs1 */ > #define GFS2_HAS_UUID 1 > __u8 sb_uuid[16]; /* The UUID, maybe 0 for backwards compat */ > + > }; > > /* > @@ -188,8 +190,10 @@ struct gfs2_rgrp { > __be32 rg_dinodes; > __be32 __pad; > __be64 rg_igeneration; > + __be64 rg_next; /* Address of the next resource group */ > + __u8 rg_uuid[16]; /* The UUID, maybe 0 for backwards compat */ > > - __u8 rg_reserved[80]; /* Several fields from gfs1 now reserved */ > + __u8 rg_reserved[64]; /* Several fields from gfs1 now reserved */ > }; > > /* ^ permalink raw reply [flat|nested] 6+ messages in thread
* [Cluster-devel] [RFC] gfs2: Add sb and rgrp fields to aid fsck and grow 2014-01-29 15:32 ` Steven Whitehouse @ 2014-01-30 17:31 ` Andrew Price [not found] ` <1391107830.2725.26.camel@menhir> 0 siblings, 1 reply; 6+ messages in thread From: Andrew Price @ 2014-01-30 17:31 UTC (permalink / raw) To: cluster-devel.redhat.com On 29/01/14 15:32, Steven Whitehouse wrote: > Hi, > > On Wed, 2014-01-29 at 14:47 +0000, Andrew Price wrote: >> This adds some fields to the superblock and resource group header >> structures that we can use in rg size and address discovery in gfs2_grow >> and fsck.gfs2. They are not intended to be changed after mkfs time. >> >> sb_rgsize is the base resource group size used by mkfs.gfs2, before any >> adjustment or alignment. It is required in order to extend the fs with >> the correct resource group size in gfs2_grow and can also be used by >> fsck.gfs2 when rebuilding broken resource groups. >> > I still don't see the point of adding this, really. We can calculate a > sensible size and use that for extending the rgrps. I'm not really sure what you mean by a sensible size. Ideally we should be able to know or predict the actual rgrp sizes, in order to reuse the code which builds resource groups in mkfs in other rgrp appending, discovery and fixing code. Having the original value at our disposal would allow us to do that and take some guesswork out of fsck.gfs2. I think users would appreciate the consistency between the arguments they gave to mkfs.gfs2 and the values gfs2_grow uses, too. > It might be worth > considering a suitable interface to ask the kernel where the existing > rgrps are (all of them!) from userland while the fs is mounted though. > If the fs is not mounted, then the information can be easily gathered by > looking at the existing rgrp layout. That might be fine for gfs2_grow's purposes, but if the fs has a corrupted rindex then it will still be difficult for fsck.gfs2 to get the information reliably. >> rg_next is the address of the next resource group and is set by >> mkfs.gfs2. It is intended to be used as a hint to fsck.gfs2 and can be >> used by other tools which need to read the resource groups sequentially. >> > It needs to be set elsewhere too - there is no reason that we cannot > upgrade older fs by adding this info each time we write an rgrp header > that does not already have this info in it. Yes, that makes sense, it'd be set by gfs2_convert also. > Also, we could use the 32 > bit field rather than a 64 bit one, since the max size of the rgrp is 32 > bits I think? Or is there some corner case that we need to take care of > perhaps? Well I had intended it would be an absolute fs block address but if we use an offset then we have to keep in mind that there will be an alignment gap after the end of a rgrp in many cases. I think we'd have to find a storage array with pretty gigantic stripes to exhaust that address space though. >> rg_uuid is intended to be the same as sb_uuid for the file system. It >> can be used by fsck.gfs2, when searching for resource group headers, in >> order to distinguish resource groups created as part of a previous file >> system on the device from resource groups in the current file system. >> > Again, this could be updated by writing the rgrps back to deal with > older filesystems which need to be upgraded. Yes. > That could be done as a one > off sweep, or as and when we write each rgrp. Also I wonder - if the > field is zero, we know that the rgrp is an old one that doesn't have it > set, but if someone changes the uuid at a later date, then what? That's a good point. > Maybe > we can use the uuid as a way to set it to start with (in mkfs), but > after that we'd use the value from the first rgrp to fill in later > rgrps. If the first rgrp was zero then we'd not update the other rgrps > until the first rgrp had a value in it. Or something like that... I just > want to be certain that we understand what this field will mean in all > possible cases, Yes, unless you'd prefer a separate sb_rg_uuid field in the superblock we should treat it differently to the sb_uuid after mkfs and only expect the rg_uuids to be consistent with themselves. There's a corner case where the first rgrp might have its uuid while the others still have zero, though, which will need some more though. Andy > > Steve. > >> Signed-off-by: Andrew Price <anprice@redhat.com> >> --- >> include/uapi/linux/gfs2_ondisk.h | 8 ++++++-- >> 1 file changed, 6 insertions(+), 2 deletions(-) >> >> diff --git a/include/uapi/linux/gfs2_ondisk.h b/include/uapi/linux/gfs2_ondisk.h >> index 0f24c07..f1489cb 100644 >> --- a/include/uapi/linux/gfs2_ondisk.h >> +++ b/include/uapi/linux/gfs2_ondisk.h >> @@ -118,7 +118,8 @@ struct gfs2_sb { >> >> __be32 sb_bsize; >> __be32 sb_bsize_shift; >> - __u32 __pad1; /* Was journal segment size in gfs1 */ >> + __be32 sb_rgsize; /* Resource group size used on fs creation. >> + Was journal segment size in gfs1 */ >> >> struct gfs2_inum sb_master_dir; /* Was jindex dinode in gfs1 */ >> struct gfs2_inum __pad2; /* Was rindex dinode in gfs1 */ >> @@ -131,6 +132,7 @@ struct gfs2_sb { >> struct gfs2_inum __pad4; /* Was licence inode in gfs1 */ >> #define GFS2_HAS_UUID 1 >> __u8 sb_uuid[16]; /* The UUID, maybe 0 for backwards compat */ >> + >> }; >> >> /* >> @@ -188,8 +190,10 @@ struct gfs2_rgrp { >> __be32 rg_dinodes; >> __be32 __pad; >> __be64 rg_igeneration; >> + __be64 rg_next; /* Address of the next resource group */ >> + __u8 rg_uuid[16]; /* The UUID, maybe 0 for backwards compat */ >> >> - __u8 rg_reserved[80]; /* Several fields from gfs1 now reserved */ >> + __u8 rg_reserved[64]; /* Several fields from gfs1 now reserved */ >> }; >> >> /* > > ^ permalink raw reply [flat|nested] 6+ messages in thread
[parent not found: <1391107830.2725.26.camel@menhir>]
* [Cluster-devel] [RFC] gfs2: Add sb and rgrp fields to aid fsck and grow [not found] ` <1391107830.2725.26.camel@menhir> @ 2014-04-07 15:50 ` Andrew Price 2014-04-09 12:29 ` Steven Whitehouse 0 siblings, 1 reply; 6+ messages in thread From: Andrew Price @ 2014-04-07 15:50 UTC (permalink / raw) To: cluster-devel.redhat.com [Didn't CC cluster-devel - re-sending] Hi, So this conversation went dormant for a while but now that the resource group size question is out of the way (sb_rgsize definitely isn't needed) and the usage of rg_next is pretty straightforward, we still have the semantics of rg_uuid to pin down. To recap: diff --git a/include/uapi/linux/gfs2_ondisk.h b/include/uapi/linux/gfs2_ondisk.h index db3fdd0..e425413 100644 --- a/include/uapi/linux/gfs2_ondisk.h +++ b/include/uapi/linux/gfs2_ondisk.h @@ -186,10 +186,11 @@ struct gfs2_rgrp { __be32 rg_flags; __be32 rg_free; __be32 rg_dinodes; - __be32 __pad; + __be32 rg_next; __be64 rg_igeneration; + __u8 rg_uuid[16]; - __u8 rg_reserved[80]; /* Several fields from gfs1 now reserved */ + __u8 rg_reserved[64]; /* Several fields from gfs1 now reserved */ }; /* On 30/01/14 18:50, Steven Whitehouse wrote: > On Thu, 2014-01-30 at 17:31 +0000, Andrew Price wrote: >> On 29/01/14 15:32, Steven Whitehouse wrote: >>> On Wed, 2014-01-29 at 14:47 +0000, Andrew Price wrote: >>>> rg_uuid is intended to be the same as sb_uuid for the file system. It >>>> can be used by fsck.gfs2, when searching for resource group headers, in >>>> order to distinguish resource groups created as part of a previous file >>>> system on the device from resource groups in the current file system. >>>> >>> Again, this could be updated by writing the rgrps back to deal with >>> older filesystems which need to be upgraded. >> >> Yes. >> >>> That could be done as a one >>> off sweep, or as and when we write each rgrp. Also I wonder - if the >>> field is zero, we know that the rgrp is an old one that doesn't have it >>> set, but if someone changes the uuid at a later date, then what? >> >> That's a good point. >> >>> Maybe >>> we can use the uuid as a way to set it to start with (in mkfs), but >>> after that we'd use the value from the first rgrp to fill in later >>> rgrps. If the first rgrp was zero then we'd not update the other rgrps >>> until the first rgrp had a value in it. Or something like that... I just >>> want to be certain that we understand what this field will mean in all >>> possible cases, >> >> Yes, unless you'd prefer a separate sb_rg_uuid field in the superblock >> we should treat it differently to the sb_uuid after mkfs and only expect >> the rg_uuids to be consistent with themselves. There's a corner case >> where the first rgrp might have its uuid while the others still have >> zero, though, which will need some more though. >> > I don't think we need a sb field really. We can look at the first rgrp > and if its zero, we know that we can set it to a random value, and then > update all the other rgrps to the same value. Or something along those > lines, anyway, I agree that the sb field probably isn't required. From fsck's perspective I'm a little worried about the lack of atomicity. If we're using the first rgrp's uuid as an indicator of what the others' should be, it might be safer for gfs2 to update all of the rgrps' uuids except the first one and then update the first one, once we know the rest have been updated. Then, if the first one is still zero we know all bets are off and we shouldn't even look at rg_uuid any more because an update wasn't completed. Is that feasible/sensible? Andy ^ permalink raw reply related [flat|nested] 6+ messages in thread
* [Cluster-devel] [RFC] gfs2: Add sb and rgrp fields to aid fsck and grow 2014-04-07 15:50 ` Andrew Price @ 2014-04-09 12:29 ` Steven Whitehouse 0 siblings, 0 replies; 6+ messages in thread From: Steven Whitehouse @ 2014-04-09 12:29 UTC (permalink / raw) To: cluster-devel.redhat.com Hi, Yes - if its easy enough to implement using the first rg as an indicator for the others, then that sounds ok to me. I'm not sure how easy that would be without looking more closely at the code though, Steve. On 07/04/14 16:50, Andrew Price wrote: > [Didn't CC cluster-devel - re-sending] > > Hi, > > So this conversation went dormant for a while but now that the > resource group size question is out of the way (sb_rgsize definitely > isn't needed) and the usage of rg_next is pretty straightforward, we > still have the semantics of rg_uuid to pin down. To recap: > > diff --git a/include/uapi/linux/gfs2_ondisk.h > b/include/uapi/linux/gfs2_ondisk.h > index db3fdd0..e425413 100644 > --- a/include/uapi/linux/gfs2_ondisk.h > +++ b/include/uapi/linux/gfs2_ondisk.h > @@ -186,10 +186,11 @@ struct gfs2_rgrp { > __be32 rg_flags; > __be32 rg_free; > __be32 rg_dinodes; > - __be32 __pad; > + __be32 rg_next; > __be64 rg_igeneration; > + __u8 rg_uuid[16]; > > - __u8 rg_reserved[80]; /* Several fields from gfs1 now reserved */ > + __u8 rg_reserved[64]; /* Several fields from gfs1 now reserved */ > }; > > /* > > On 30/01/14 18:50, Steven Whitehouse wrote: >> On Thu, 2014-01-30 at 17:31 +0000, Andrew Price wrote: >>> On 29/01/14 15:32, Steven Whitehouse wrote: >>>> On Wed, 2014-01-29 at 14:47 +0000, Andrew Price wrote: >>>>> rg_uuid is intended to be the same as sb_uuid for the file system. It >>>>> can be used by fsck.gfs2, when searching for resource group >>>>> headers, in >>>>> order to distinguish resource groups created as part of a previous >>>>> file >>>>> system on the device from resource groups in the current file system. >>>>> >>>> Again, this could be updated by writing the rgrps back to deal with >>>> older filesystems which need to be upgraded. >>> >>> Yes. >>> >>>> That could be done as a one >>>> off sweep, or as and when we write each rgrp. Also I wonder - if the >>>> field is zero, we know that the rgrp is an old one that doesn't >>>> have it >>>> set, but if someone changes the uuid at a later date, then what? >>> >>> That's a good point. >>> >>>> Maybe >>>> we can use the uuid as a way to set it to start with (in mkfs), but >>>> after that we'd use the value from the first rgrp to fill in later >>>> rgrps. If the first rgrp was zero then we'd not update the other rgrps >>>> until the first rgrp had a value in it. Or something like that... I >>>> just >>>> want to be certain that we understand what this field will mean in all >>>> possible cases, >>> >>> Yes, unless you'd prefer a separate sb_rg_uuid field in the superblock >>> we should treat it differently to the sb_uuid after mkfs and only >>> expect >>> the rg_uuids to be consistent with themselves. There's a corner case >>> where the first rgrp might have its uuid while the others still have >>> zero, though, which will need some more though. >>> >> I don't think we need a sb field really. We can look at the first rgrp >> and if its zero, we know that we can set it to a random value, and then >> update all the other rgrps to the same value. Or something along those >> lines, anyway, > > I agree that the sb field probably isn't required. From fsck's > perspective I'm a little worried about the lack of atomicity. If we're > using the first rgrp's uuid as an indicator of what the others' should > be, it might be safer for gfs2 to update all of the rgrps' uuids > except the first one and then update the first one, once we know the > rest have been updated. Then, if the first one is still zero we know > all bets are off and we shouldn't even look at rg_uuid any more > because an update wasn't completed. Is that feasible/sensible? > > Andy ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2014-04-09 12:29 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-01-29 14:47 [Cluster-devel] [RFC] gfs2: Add sb and rgrp fields to aid fsck and grow Andrew Price 2014-01-29 14:54 ` Andrew Price 2014-01-29 15:32 ` Steven Whitehouse 2014-01-30 17:31 ` Andrew Price [not found] ` <1391107830.2725.26.camel@menhir> 2014-04-07 15:50 ` Andrew Price 2014-04-09 12:29 ` Steven Whitehouse
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).