* [PATCH] xfs: reserve fields in inode for parent ptr and alloc policy
@ 2013-04-10 18:24 Mark Tinguely
2013-04-10 18:56 ` Eric Sandeen
2013-04-11 3:00 ` Dave Chinner
0 siblings, 2 replies; 9+ messages in thread
From: Mark Tinguely @ 2013-04-10 18:24 UTC (permalink / raw)
To: xfs
[-- Attachment #1: xfs-reserve-parent-and-allocation-fields-in-inode.patch --]
[-- Type: text/plain, Size: 3118 bytes --]
Reserve fields in new inode layout for parent pointer and
allocation policy.
----
The inode will hold the parent information for the first
link to a file. Information for the other links will be
held in extended attribute entries.
The "di_parino" is the inode of the parent directory. The
directory information for this entry is located the parent
directory with "di_paroff" offset.
The di_parino/di_paroff concept code is running.
----
The "di_allocpolicy" will be used to remember the allocation
policy associated with this inode.
Signed-off-by: Mark Tinguely <tinguely@sgi.com>
---
fs/xfs/xfs_dinode.h | 3 +++
fs/xfs/xfs_inode.c | 6 ++++++
fs/xfs/xfs_inode.h | 3 +++
3 files changed, 12 insertions(+)
Index: b/fs/xfs/xfs_dinode.h
===================================================================
--- a/fs/xfs/xfs_dinode.h
+++ b/fs/xfs/xfs_dinode.h
@@ -76,6 +76,9 @@ typedef struct xfs_dinode {
__be64 di_changecount; /* number of attribute changes */
__be64 di_lsn; /* flush sequence */
__be64 di_flags2; /* more random flags */
+ __be64 di_parino; /* parent inode */
+ __be32 di_paroff; /* offset into parent directory */
+ __be32 di_allocpolicy; /* allocation policy number */
__u8 di_pad2[16]; /* more padding for future expansion */
/* fields only written to during inode creation */
Index: b/fs/xfs/xfs_inode.c
===================================================================
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -875,6 +875,9 @@ xfs_dinode_from_disk(
to->di_flags2 = be64_to_cpu(from->di_flags2);
to->di_ino = be64_to_cpu(from->di_ino);
to->di_lsn = be64_to_cpu(from->di_lsn);
+ to->di_parino = be64_to_cpu(from->di_parino);
+ to->di_paroff = be32_to_cpu(from->di_paroff);
+ to->di_allocpolicy = be32_to_cpu(from->di_allocpolicy);
memcpy(to->di_pad2, from->di_pad2, sizeof(to->di_pad2));
uuid_copy(&to->di_uuid, &from->di_uuid);
}
@@ -922,6 +925,9 @@ xfs_dinode_to_disk(
to->di_flags2 = cpu_to_be64(from->di_flags2);
to->di_ino = cpu_to_be64(from->di_ino);
to->di_lsn = cpu_to_be64(from->di_lsn);
+ to->di_parino = cpu_to_be64(from->di_parino);
+ to->di_paroff = cpu_to_be32(from->di_paroff);
+ to->di_allocpolicy = cpu_to_be32(from->di_allocpolicy);
memcpy(to->di_pad2, from->di_pad2, sizeof(to->di_pad2));
uuid_copy(&to->di_uuid, &from->di_uuid);
}
Index: b/fs/xfs/xfs_inode.h
===================================================================
--- a/fs/xfs/xfs_inode.h
+++ b/fs/xfs/xfs_inode.h
@@ -159,6 +159,9 @@ typedef struct xfs_icdinode {
__uint64_t di_changecount; /* number of attribute changes */
xfs_lsn_t di_lsn; /* flush sequence */
__uint64_t di_flags2; /* more random flags */
+ xfs_ino_t di_parino; /* parent inode */
+ __uint32_t di_paroff; /* offset into parent directory */
+ __uint32_t di_allocpolicy; /* allocation policy number */
__uint8_t di_pad2[16]; /* more padding for future expansion */
/* fields only written to during inode creation */
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [PATCH] xfs: reserve fields in inode for parent ptr and alloc policy 2013-04-10 18:24 [PATCH] xfs: reserve fields in inode for parent ptr and alloc policy Mark Tinguely @ 2013-04-10 18:56 ` Eric Sandeen 2013-04-10 19:39 ` Rich Johnston 2013-04-11 3:00 ` Dave Chinner 1 sibling, 1 reply; 9+ messages in thread From: Eric Sandeen @ 2013-04-10 18:56 UTC (permalink / raw) To: Mark Tinguely; +Cc: xfs On 4/10/13 1:24 PM, Mark Tinguely wrote: > Reserve fields in new inode layout for parent pointer and > allocation policy. > ---- > The inode will hold the parent information for the first > link to a file. Information for the other links will be > held in extended attribute entries. > > The "di_parino" is the inode of the parent directory. The > directory information for this entry is located the parent > directory with "di_paroff" offset. > > The di_parino/di_paroff concept code is running. > ---- > The "di_allocpolicy" will be used to remember the allocation > policy associated with this inode. can you say more about this allocation policy? -Eric > Signed-off-by: Mark Tinguely <tinguely@sgi.com> > --- > fs/xfs/xfs_dinode.h | 3 +++ > fs/xfs/xfs_inode.c | 6 ++++++ > fs/xfs/xfs_inode.h | 3 +++ > 3 files changed, 12 insertions(+) > > Index: b/fs/xfs/xfs_dinode.h > =================================================================== > --- a/fs/xfs/xfs_dinode.h > +++ b/fs/xfs/xfs_dinode.h > @@ -76,6 +76,9 @@ typedef struct xfs_dinode { > __be64 di_changecount; /* number of attribute changes */ > __be64 di_lsn; /* flush sequence */ > __be64 di_flags2; /* more random flags */ > + __be64 di_parino; /* parent inode */ > + __be32 di_paroff; /* offset into parent directory */ > + __be32 di_allocpolicy; /* allocation policy number */ > __u8 di_pad2[16]; /* more padding for future expansion */ > > /* fields only written to during inode creation */ > Index: b/fs/xfs/xfs_inode.c > =================================================================== > --- a/fs/xfs/xfs_inode.c > +++ b/fs/xfs/xfs_inode.c > @@ -875,6 +875,9 @@ xfs_dinode_from_disk( > to->di_flags2 = be64_to_cpu(from->di_flags2); > to->di_ino = be64_to_cpu(from->di_ino); > to->di_lsn = be64_to_cpu(from->di_lsn); > + to->di_parino = be64_to_cpu(from->di_parino); > + to->di_paroff = be32_to_cpu(from->di_paroff); > + to->di_allocpolicy = be32_to_cpu(from->di_allocpolicy); > memcpy(to->di_pad2, from->di_pad2, sizeof(to->di_pad2)); > uuid_copy(&to->di_uuid, &from->di_uuid); > } > @@ -922,6 +925,9 @@ xfs_dinode_to_disk( > to->di_flags2 = cpu_to_be64(from->di_flags2); > to->di_ino = cpu_to_be64(from->di_ino); > to->di_lsn = cpu_to_be64(from->di_lsn); > + to->di_parino = cpu_to_be64(from->di_parino); > + to->di_paroff = cpu_to_be32(from->di_paroff); > + to->di_allocpolicy = cpu_to_be32(from->di_allocpolicy); > memcpy(to->di_pad2, from->di_pad2, sizeof(to->di_pad2)); > uuid_copy(&to->di_uuid, &from->di_uuid); > } > Index: b/fs/xfs/xfs_inode.h > =================================================================== > --- a/fs/xfs/xfs_inode.h > +++ b/fs/xfs/xfs_inode.h > @@ -159,6 +159,9 @@ typedef struct xfs_icdinode { > __uint64_t di_changecount; /* number of attribute changes */ > xfs_lsn_t di_lsn; /* flush sequence */ > __uint64_t di_flags2; /* more random flags */ > + xfs_ino_t di_parino; /* parent inode */ > + __uint32_t di_paroff; /* offset into parent directory */ > + __uint32_t di_allocpolicy; /* allocation policy number */ > __uint8_t di_pad2[16]; /* more padding for future expansion */ > > /* fields only written to during inode creation */ > > > _______________________________________________ > xfs mailing list > xfs@oss.sgi.com > http://oss.sgi.com/mailman/listinfo/xfs > _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] xfs: reserve fields in inode for parent ptr and alloc policy 2013-04-10 18:56 ` Eric Sandeen @ 2013-04-10 19:39 ` Rich Johnston 2013-04-10 20:31 ` Mark Tinguely 0 siblings, 1 reply; 9+ messages in thread From: Rich Johnston @ 2013-04-10 19:39 UTC (permalink / raw) To: Eric Sandeen; +Cc: Mark Tinguely, xfs On 04/10/2013 01:56 PM, Eric Sandeen wrote: > On 4/10/13 1:24 PM, Mark Tinguely wrote: >> Reserve fields in new inode layout for parent pointer and >> allocation policy. >> ---- >> The inode will hold the parent information for the first >> link to a file. Information for the other links will be >> held in extended attribute entries. >> >> The "di_parino" is the inode of the parent directory. The >> directory information for this entry is located the parent >> directory with "di_paroff" offset. >> >> The di_parino/di_paroff concept code is running. >> ---- >> The "di_allocpolicy" will be used to remember the allocation >> policy associated with this inode. > > can you say more about this allocation policy? > > -Eric No its super secret information. ;) Its on my plate Eric, because Mark was making a change for parent ptrs, I asked him to request space for allocation policies also. I don't have all the details yet but here is a very high level concept. Identify allocation groups by names (or numbers -- preferably using names in user-visible areas), allowing many different areas. Placing the allocation policy outside of user programs is necessary for this to be successful. Current thoughts on proposed a layered allocation policies: Policy for the entire filesystem Policy attached to a directory (whose policy would be inherited by subdirectories when subdirectories are created) Policy for a single file The policy would define: where to place file data where to place metadata for the files. a prefered allocation group for placing file data (for directories). --Rich _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] xfs: reserve fields in inode for parent ptr and alloc policy 2013-04-10 19:39 ` Rich Johnston @ 2013-04-10 20:31 ` Mark Tinguely 2013-04-10 20:40 ` Eric Sandeen 2013-04-11 3:28 ` Dave Chinner 0 siblings, 2 replies; 9+ messages in thread From: Mark Tinguely @ 2013-04-10 20:31 UTC (permalink / raw) To: Rich Johnston; +Cc: Eric Sandeen, xfs On 04/10/13 14:39, Rich Johnston wrote: > On 04/10/2013 01:56 PM, Eric Sandeen wrote: >> On 4/10/13 1:24 PM, Mark Tinguely wrote: >>> Reserve fields in new inode layout for parent pointer and >>> allocation policy. >>> ---- >>> The inode will hold the parent information for the first >>> link to a file. Information for the other links will be >>> held in extended attribute entries. >>> >>> The "di_parino" is the inode of the parent directory. The >>> directory information for this entry is located the parent >>> directory with "di_paroff" offset. >>> >>> The di_parino/di_paroff concept code is running. >>> ---- >>> The "di_allocpolicy" will be used to remember the allocation >>> policy associated with this inode. >> >> can you say more about this allocation policy? >> >> -Eric > > No its super secret information. ;) > > Its on my plate Eric, because Mark was making a change for parent ptrs, > I asked him to request space for allocation policies also. > > I don't have all the details yet but here is a very high level concept. > > Identify allocation groups by names (or numbers -- preferably using names > in user-visible areas), allowing many different areas. Placing the > allocation > policy outside of user programs is necessary for this to be successful. > > Current thoughts on proposed a layered allocation policies: > > Policy for the entire filesystem > Policy attached to a directory (whose policy would be inherited by > subdirectories when subdirectories are created) > Policy for a single file > > The policy would define: > > where to place file data > where to place metadata for the files. > a prefered allocation group for placing file data (for directories). > > --Rich > > _______________________________________________ > xfs mailing list > xfs@oss.sgi.com > http://oss.sgi.com/mailman/listinfo/xfs The allocation policies is based on work by Dave: http://oss.sgi.com/archives/xfs/2009-02/msg00250.html --Mark. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] xfs: reserve fields in inode for parent ptr and alloc policy 2013-04-10 20:31 ` Mark Tinguely @ 2013-04-10 20:40 ` Eric Sandeen 2013-04-11 3:28 ` Dave Chinner 1 sibling, 0 replies; 9+ messages in thread From: Eric Sandeen @ 2013-04-10 20:40 UTC (permalink / raw) To: Mark Tinguely; +Cc: Rich Johnston, xfs On 4/10/13 3:31 PM, Mark Tinguely wrote: > On 04/10/13 14:39, Rich Johnston wrote: >> On 04/10/2013 01:56 PM, Eric Sandeen wrote: >>> On 4/10/13 1:24 PM, Mark Tinguely wrote: >>>> Reserve fields in new inode layout for parent pointer and >>>> allocation policy. >>>> ---- >>>> The inode will hold the parent information for the first >>>> link to a file. Information for the other links will be >>>> held in extended attribute entries. >>>> >>>> The "di_parino" is the inode of the parent directory. The >>>> directory information for this entry is located the parent >>>> directory with "di_paroff" offset. >>>> >>>> The di_parino/di_paroff concept code is running. >>>> ---- >>>> The "di_allocpolicy" will be used to remember the allocation >>>> policy associated with this inode. >>> >>> can you say more about this allocation policy? >>> >>> -Eric >> >> No its super secret information. ;) >> >> Its on my plate Eric, because Mark was making a change for parent ptrs, >> I asked him to request space for allocation policies also. >> >> I don't have all the details yet but here is a very high level concept. >> >> Identify allocation groups by names (or numbers -- preferably using names >> in user-visible areas), allowing many different areas. Placing the >> allocation >> policy outside of user programs is necessary for this to be successful. >> >> Current thoughts on proposed a layered allocation policies: >> >> Policy for the entire filesystem >> Policy attached to a directory (whose policy would be inherited by >> subdirectories when subdirectories are created) >> Policy for a single file >> >> The policy would define: >> >> where to place file data >> where to place metadata for the files. >> a prefered allocation group for placing file data (for directories). >> >> --Rich >> > > The allocation policies is based on work by Dave: > > http://oss.sgi.com/archives/xfs/2009-02/msg00250.html > > --Mark. > Great, thanks guys. I had forgotten about that TBH. Just wondered if you had it in the works, or reserving just in case, or . . . -Eric _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] xfs: reserve fields in inode for parent ptr and alloc policy 2013-04-10 20:31 ` Mark Tinguely 2013-04-10 20:40 ` Eric Sandeen @ 2013-04-11 3:28 ` Dave Chinner 1 sibling, 0 replies; 9+ messages in thread From: Dave Chinner @ 2013-04-11 3:28 UTC (permalink / raw) To: Mark Tinguely; +Cc: Eric Sandeen, Rich Johnston, xfs On Wed, Apr 10, 2013 at 03:31:53PM -0500, Mark Tinguely wrote: > On 04/10/13 14:39, Rich Johnston wrote: > >On 04/10/2013 01:56 PM, Eric Sandeen wrote: > >>On 4/10/13 1:24 PM, Mark Tinguely wrote: > >>>The "di_allocpolicy" will be used to remember the allocation > >>>policy associated with this inode. > >> > >>can you say more about this allocation policy? > >> > >>-Eric > > > >No its super secret information. ;) > > > >Its on my plate Eric, because Mark was making a change for parent ptrs, > >I asked him to request space for allocation policies also. > > > >I don't have all the details yet but here is a very high level concept. The on-disk format is a low level detail that falls out at the bottom of the design/implementation/review cycle. It doesn't get defined by high level concepts... > >Identify allocation groups by names (or numbers -- preferably using names > >in user-visible areas), allowing many different areas. Placing the > >allocation > >policy outside of user programs is necessary for this to be successful. > > > >Current thoughts on proposed a layered allocation policies: > > > >Policy for the entire filesystem > >Policy attached to a directory (whose policy would be inherited by > >subdirectories when subdirectories are created) > >Policy for a single file > > > >The policy would define: > > > >where to place file data > >where to place metadata for the files. > >a prefered allocation group for placing file data (for directories). Which is a summary of what this code: > The allocation policies is based on work by Dave: > > http://oss.sgi.com/archives/xfs/2009-02/msg00250.html started with and was building on. What I was trying to get to in that patch series was an arbitrarily extensible allocation policy infrastructure, and that patch set was proof-of-concept code I used to flesh out ideas. Yes, it used 32 bits in the inode, but keep in mind that changed several times in the patch set as I implemented new stuff and changed heirarchies, definitions and concepts mid-patchset. But it isn't a reference design - it was a research vehicle..... Indeed, the patch set is an exact demonstration of the functionality required by the on-disk format not being properly understood until the functionality has at been fully implemented in a POC. And I hadn't got anywhere near that with the above patch set. As it is, I think that an extended attribute is be a better place for allocation policy information. An xattr is far easier to modify from userspace, and allows arbitrary allocator primitives being exposed to policy control. That was the problem with the above patch set - it didn't expose policy controls individually - it exposed them as a defined set, and only the sets defined by the kernel were possible. As such, direct control of locality is simply not possible with the above patch set. Having an attribute format that defines all the different control parameters allows userspace to define complex policies that match the specific storage topology of the system the policy is designed for. This is impossible to express in a standard kernel with the approach I was taking in the above patch set, and i was already thinking about xattr based primitives to allow fine-grained exposure of the allocation primitives I'd abstracted out of the kernel code... IOWs, We need to have agreement on design and implementation direction of a feature before we consider what the on-disk format is going to be, so reserving space on disk is extremely premature at this point.... Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] xfs: reserve fields in inode for parent ptr and alloc policy 2013-04-10 18:24 [PATCH] xfs: reserve fields in inode for parent ptr and alloc policy Mark Tinguely 2013-04-10 18:56 ` Eric Sandeen @ 2013-04-11 3:00 ` Dave Chinner 2013-04-11 13:54 ` Mark Tinguely 1 sibling, 1 reply; 9+ messages in thread From: Dave Chinner @ 2013-04-11 3:00 UTC (permalink / raw) To: Mark Tinguely; +Cc: xfs On Wed, Apr 10, 2013 at 01:24:24PM -0500, Mark Tinguely wrote: > Reserve fields in new inode layout for parent pointer and > allocation policy. Not without a design review - there is no way we are going to blindly reserve space in any on-disk structure without first having a solid design, published proof-of-concept code and agreement that the format changes being proposed are the right way to proceed. Where are the design docs/code for these features? > ---- > The inode will hold the parent information for the first > link to a file. Information for the other links will be > held in extended attribute entries. > > The "di_parino" is the inode of the parent directory. The > directory information for this entry is located the parent > directory with "di_paroff" offset. > > The di_parino/di_paroff concept code is running. How does it handle hard links? (i.e. multiple parents) Also, inode number is not enough for unique identification of the parent - generation number is also required. FWIW, lets go back to the (was almost finished) parent pointer code from 2009: http://oss.sgi.com/archives/xfs/2009-01/msg01068.html That uses xattrs to store parent information - inode #, generation #, and a counter - for each parent. It requires a counter because you can have the same inode hard linked into the same parent directory multiple times: typedef struct xfs_parent_eaname_rec { __be64 p_ino; __be32 p_gen; __be32 p_cnt; } xfs_parent_eaname_rec_t; And a transaction appended to the the inode create, link, rename and remove operations to manage the xattrs so all cases involving hard links work just fine. Indeed, the single di_parino/di_paroff concept was one of the original designs considered because of it's simplicity, but it was rejected because of that very simplicity - it cannot handle common use cases involving hardlinks. Release early, release often? > ---- > The "di_allocpolicy" will be used to remember the allocation > policy associated with this inode. This is exactly what we have padding in the inode for - so that future additions to the on-disk inode can be added via feature bits to indicate the fields are present. Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] xfs: reserve fields in inode for parent ptr and alloc policy 2013-04-11 3:00 ` Dave Chinner @ 2013-04-11 13:54 ` Mark Tinguely 2013-04-12 0:24 ` Dave Chinner 0 siblings, 1 reply; 9+ messages in thread From: Mark Tinguely @ 2013-04-11 13:54 UTC (permalink / raw) To: Dave Chinner; +Cc: xfs On 04/10/13 22:00, Dave Chinner wrote: > On Wed, Apr 10, 2013 at 01:24:24PM -0500, Mark Tinguely wrote: >> Reserve fields in new inode layout for parent pointer and >> allocation policy. > > Not without a design review - there is no way we are going to > blindly reserve space in any on-disk structure without first having > a solid design, published proof-of-concept code and agreement that > the format changes being proposed are the right way to proceed. > > Where are the design docs/code for these features? Then you might want to bump the pad size. > >> ---- >> The inode will hold the parent information for the first >> link to a file. Information for the other links will be >> held in extended attribute entries. >> >> The "di_parino" is the inode of the parent directory. The >> directory information for this entry is located the parent >> directory with "di_paroff" offset. >> >> The di_parino/di_paroff concept code is running. > > How does it handle hard links? (i.e. multiple parents) As I stated, the hard links will use extended attribute entries. > > Also, inode number is not enough for unique identification of the > parent - generation number is also required. Per the 2005 XFS features meeting, the inode and directory offset will uniquely describe a link - (thank-you to Glen Overby for that observation). The concept code just using one link verifies this fact. Using the inode/offset as the identifier is compact and also gives information to the file name. > > FWIW, lets go back to the (was almost finished) parent pointer > code from 2009: > > http://oss.sgi.com/archives/xfs/2009-01/msg01068.html yep, read it, studied it, plan to use parts of it but only where needed. > > That uses xattrs to store parent information - inode #, generation > #, and a counter - for each parent. It requires a counter because > you can have the same inode hard linked into the same parent > directory multiple times: > > typedef struct xfs_parent_eaname_rec { > __be64 p_ino; > __be32 p_gen; > __be32 p_cnt; > } xfs_parent_eaname_rec_t; > > And a transaction appended to the the inode create, link, rename and > remove operations to manage the xattrs so all cases involving hard > links work just fine. > > Indeed, the single di_parino/di_paroff concept was one of the > original designs considered because of it's simplicity, but it was > rejected because of that very simplicity - it cannot handle common > use cases involving hardlinks. According to Geoffrey's notes, the hybrid approach was discussed too. The single link case will save all the EA operations for the majority of the inodes. > > Release early, release often? No, trust but verify. > >> ---- >> The "di_allocpolicy" will be used to remember the allocation >> policy associated with this inode. > > This is exactly what we have padding in the inode for - so that > future additions to the on-disk inode can be added via feature bits > to indicate the fields are present. > > Cheers, > > Dave. --Mark _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] xfs: reserve fields in inode for parent ptr and alloc policy 2013-04-11 13:54 ` Mark Tinguely @ 2013-04-12 0:24 ` Dave Chinner 0 siblings, 0 replies; 9+ messages in thread From: Dave Chinner @ 2013-04-12 0:24 UTC (permalink / raw) To: Mark Tinguely; +Cc: xfs On Thu, Apr 11, 2013 at 08:54:20AM -0500, Mark Tinguely wrote: > On 04/10/13 22:00, Dave Chinner wrote: > >On Wed, Apr 10, 2013 at 01:24:24PM -0500, Mark Tinguely wrote: > >>Reserve fields in new inode layout for parent pointer and > >>allocation policy. > > > >Not without a design review - there is no way we are going to > >blindly reserve space in any on-disk structure without first having > >a solid design, published proof-of-concept code and agreement that > >the format changes being proposed are the right way to proceed. > > > >Where are the design docs/code for these features? > > Then you might want to bump the pad size. Not without good reason. Padding is not designed to accomodate new features that require 16 bytes of core inode space. If you need that amount of space in the inode, we have EAs for that.... The plan of record for the past 5-6 years has been that parent pointers require an arbitrary amount of space in the inode so they need to be entirely EA based. Hence I'm not prepared to agree to an inode core change for a shiny new parent pointer implementation I first heard about 2 days ago and know absolutely nothing about. You need to show everyone the code so we can get some idea of what benefit putting this information in the inode core will bring us over a purely EA based implementation. FWIW, inodes are versioned and it's trivial to bump the inode version and add feature bit if an inode core format change is absolutely necessary. From that perspective, I don't see any need to rush a change at this point in time. > >> ---- > >>The inode will hold the parent information for the first > >>link to a file. Information for the other links will be > >>held in extended attribute entries. > >> > >>The "di_parino" is the inode of the parent directory. The > >>directory information for this entry is located the parent > >>directory with "di_paroff" offset. > >> > >>The di_parino/di_paroff concept code is running. > > > >How does it handle hard links? (i.e. multiple parents) > > As I stated, the hard links will use extended attribute entries. That will add significant complexity of the implementation. > >Also, inode number is not enough for unique identification of the > >parent - generation number is also required. > > Per the 2005 XFS features meeting, I was at those meetings and wasn't allowed to keep a copy of those notes when I left SGI. If you want to refer to them, you'll need to post the notes so everyone knows what was discussed. ;) > the inode and directory offset > will uniquely describe a link It doesn't, though. An inode can only be uniquely identified by the combination of the inode+generation number. The inode number identifies an inode, but it doesn't identify the lifecycle instance of the inode that the reference is valid for. If this parent pointer link is going to be in any way useful for metadata scrubbing and repair, then it has to identify the instance of the parent inode that it points to. Note that I'm not saying that the EA format in the 2009 patch set is perfect - I think it needs a couple of significant tweaks. What I am saying is that there is good reasons for the format of information in that EA. > - (thank-you to Glen Overby for that > observation). The concept code just using one link verifies this > fact. Using the inode/offset as the identifier is compact and also > gives information to the file name. Yup, using d_off was also considered, but has problems, too. The problem with pointing to the directory offset is that if the directory is rebuilt for any reason (e.g. by xfs_repair), then the directory offset in the child inodes is now invalid and needs to also be fixed. And then you have the problem of needing multiple parent links for a single parent directory as you have to instantiate every single hard link to the same directory. In the current EA patchset, this does not occur as the parent structure simply counts the number of references to a given parent directory. The EA approach just points to the parent directory and keeps a count of the number of links back to the directory. Hence as long as repair doesn't reallocate the directory inode, the parent pointer information remains intact. i.e. There is less dependency between the objects and so the format is much more robust against unexpected changes to directory structures. This concedes the fact that inode number to name resolution needs to be a userspace problem. This is something we have to conceed because the presence of namespaces, security models, bind mounts and chroot environments use top-down path traversal to limit what part of the filesystem a userspace process can actually see. In-kernel resolution of reverse path names is a potential vector for information leakage between namespaces and at it's worst, a provide a method for escape. Hence there are security concerns about parent pointers that we have to take into account as well. The EA approach is also allows more flexibility for potential future directory features. For example: online defragmentation. If we want defrag to be able to compact directories into a contiguous address space, we need to be able to change the offsets of the data entries. We can do this if there are no open references to the directory at the time of the defrag. However, if we fix the offsets of directory entries in the child inode parent pointers, then this form of defrag becomes impossible to implement as we cannot update the directory and all the child inodes atomically from a userspace or transactional POV. IOWs, the parent pointer format containing an exact directory offset also limits what we can do with directories in future.... > >That uses xattrs to store parent information - inode #, generation > >#, and a counter - for each parent. It requires a counter because > >you can have the same inode hard linked into the same parent > >directory multiple times: > > > >typedef struct xfs_parent_eaname_rec { > > __be64 p_ino; > > __be32 p_gen; > > __be32 p_cnt; > >} xfs_parent_eaname_rec_t; > > > >And a transaction appended to the the inode create, link, rename and > >remove operations to manage the xattrs so all cases involving hard > >links work just fine. > > > >Indeed, the single di_parino/di_paroff concept was one of the > >original designs considered because of it's simplicity, but it was > >rejected because of that very simplicity - it cannot handle common > >use cases involving hardlinks. > > According to Geoffrey's notes, the hybrid approach was discussed > too. The single link case will save all the EA operations for the > majority of the inodes. Sure, those meetings were brain storming about how we *might* solve a problem, but keep in mind the context of those meetings was that all the first and second generation XFS gurus had left SGI and in that vaccuum nobody left at SGI fully understood the problem domains being discussed. As such, there were some really crazy ideas discussed and documented, but that doesn't mean they were the best solution or even a good idea. We came up with several possible ways to implement parent pointers, but the real world sorted them out pretty quickly after the brain storming sessions. Indeed, the real world broke the initial implementation of parent pointers pretty quickly. You may not know the entire history, but XFS on Irix effectively ended up with a useless, broken parent pointer implementation because it was implemented quickly based on assumptions made in those brainstorming seesions meeting. i.e. still without fully understanding the problem space. The limitations the Irix implementation exposed resulted in it being thrown away completely and redesigned for Linux. IOWs, the EA based patchset is what evolved from the hard lessons that Irix taught us.... > >Release early, release often? > > No, trust but verify. Verification can't be done if you don't tell us anything about what you are doing. Verification needs to be done at the design/POC phase just as much as at the final patch review stage, because there are many parties that have different requirements for the same feature. Open source development is supposed to be, well, open. Public. Not behind closed doors. Design is open, implementation is open, flamewars aout stuff are out there in public. Nothing is done behind closed doors. Saying "we're doing X" and then not telling anyone about it is the antithesis of OSS. It -excludes- the community from the process of developing the new features, and prevents them from verifying what you are doing is suitable for their needs. IOWs, there is no possibility of "trust, but verify" in OSS without "release early, release often". You develop trust by being open and allowing people to review and verify what you are doing at every stage of development - from POC through to production code. Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2013-04-12 0:25 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-04-10 18:24 [PATCH] xfs: reserve fields in inode for parent ptr and alloc policy Mark Tinguely 2013-04-10 18:56 ` Eric Sandeen 2013-04-10 19:39 ` Rich Johnston 2013-04-10 20:31 ` Mark Tinguely 2013-04-10 20:40 ` Eric Sandeen 2013-04-11 3:28 ` Dave Chinner 2013-04-11 3:00 ` Dave Chinner 2013-04-11 13:54 ` Mark Tinguely 2013-04-12 0:24 ` Dave Chinner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox