* [RFC 00/17] RFC parent inode pointers.
@ 2014-01-15 22:00 Mark Tinguely
2014-01-15 22:00 ` [RFC 01/17] xfs: (parent ptr) get offset when adding directory name Mark Tinguely
` (17 more replies)
0 siblings, 18 replies; 26+ messages in thread
From: Mark Tinguely @ 2014-01-15 22:00 UTC (permalink / raw)
To: xfs
Yeah, yeah, this has gotten buried several times and better get out on
the list for discussion before it gets buried yet again.
Parent inode support allow XFS to quickly derive a file name and
path from the mount point. This can aid in directory/path policies
and can help relocate items during filesystem shrink.
1) Representation of a parent inode entry:
There are several ways to represent the parent inode entry.
A 2005 XFS meeting came up with the following ideas:
1) Storing the parent inode# list inside the inode with a separate field
separate fork
2) Storing the parent inode# list in EA names with null values
EA: <name=inode#, value=NULL>
3) As in (2) but store the 1st parent inode# in a field in the inode
first-parent-ptr + <name=inode#, value=NULL>
4) As in (2) but store the hardlink names as EA values
EA: <name=inode#, value=fname>
5) As in (2) but store the EAs on directories as well as leaf files
EAs on directories.
6) Storing the parent inode# + directory offset in EA names with null values
EA: <name=inode#diroffset, value=NULL>
7) (kind of (4) + (6))
EA: <name=inode#diroffset, value=filename>
The preferred method was #6. Using directory and the entry offset into the
directory has turned out to be a very good idea. Directory growth and
contractions and xfs_repair does not compromise the encoding. The offset
can be gotten while doing the directory code. It is compact and easy the
parent inode / offset allows quick access to the filename information.
2) In the inode core or not?
Since we have new inode, adding the first link into the inode core
makes things very convenient. I implemented and tested both ways,
and prefer adding the first link in the inode core. One less fork
to worry about on single linked entries, like directories. xfs_create
and xfs_symlink do not need extended attribute calls, and simplifies the
parent path creation.
This implementation of the code uses 12 bytes of the inode padding for
parent pointers and places the first link in the inode core..
3) To lock between directory and attribute changes.
On one hand, the vfs mutex will keep the directory and attribute changes
in sync.
On the other hand keeping the directory and extended attribute in one
transaction should keeping the changes atomic when the filesystem
is forced down between the directory and attribute changes. Despite
all the gore (see below) of doing the directory and attribute changes
in one transaction, I think it is the correct thing to do.
The gore of keeping the directory and attribute operations in one transaction:
1) The attribute code was not written to be embedded in other functions.
The attribute code can commit and trans_dup another transaction
(xfs_trans_roll and xfs_bmapi_finish). The attribute operations have
to be done last in the transactions and even then a terrible hack has
to be done to figure out if the transaction was commited in an earlier
attribute operation so we could add the inode back into the transaction.
2) xfs_rename is log space reservation expensive.
The log_count:
xfs_rename 2
xfs_attr_set 3 and that does not add any extra for the embedded
xfs_bmap_add_attrfork()
xfs_attr_remove 3 and we can have 2 of these in a rename.
3) xfs_rename() with no allocated space reserve blocks can cause
hanging. I disabled the code in xfs_rename that:
error = xfs_trans_reserve(tp, &M_RES(mp)->tr_rename, spaceres, 0);
#ifdef HERE
if (error == ENOSPC) {
spaceres = 0;
error = xfs_trans_reserve(tp, &M_RES(mp)->tr_rename, 0, 0);
}
#endif
...
----
The enclosed RFC sample code does NOT keep the directory and attribute in
the same transaction and this code is simpler than the embedded version.
It comes with the following patches:
----- kernel space bits -----
get the offset patches:
01-pptr-retrieve-info-from-directory.patch
02-pptr-add-to_xfs_dir_removename.patch
03-pptr-add-to_xfs_dir_replace.patch
add the incompatability bit and the parent test:
04-pptr-add-xfs_sb_version_hasparent.patch
add the parent flags to the attribute files:
05-pptr-add-xfs_parent-files.patch
add the inode parent / offset to the inode:
06-pptr-add-inode-incore.patch
add the support to the differ routines:
07-pptr-add-to-xfs_create.patch
08-pptr-add-to-xfs_symlink.patch
09-pptr-add-to_xfs_link.patch
10-pptr-add-to_xfs_remove.patch
11-pptr-add-incore-to-rename.patch
xfs_rename is the most complex. I tried to add to the inode entries first
and then keep the number of extended attribute operations down. The single
transaction xfs_rename is even more complicated than this code.
add the ioctl to get the paths. SGI may want to add the inode generation
field to the output and offset information can be dropped. Minor stuff:
12-pptr-add-incore-to-getparentpaths_by_handle.patch
----- user space bits -----
The user space stuff is include for anyone who want to
kick the tires. Parent pointer requires a v5 super block
(-m crc=1") and "-i parent=1". xfstests 114 indicated that
the parent option belongs on the "-i" option.
01-xfsprogs-add-parent-pointers.patch
The xfs_repair change is there to prevent the attribute to be
thrown away because the name as corrupt. Much more xfs_repair
(and xfs_db) work is needed:
02-xfsprogs-pptr-fix-namecheck.patch
Tiny xfs_db code to list the parent/offset information:
03-xfsprogs-pptr-add-to-xfs_db.patch
Added the ioctl support. Fixed xfs_io parent -p and -c commands
With xfs_io a person can dump the parent pointer information by
path and do a consistency check. xfstest xfs/114 will run correctly.
The output will fail since I export the parent's offset instead of
the inode's generation. As mentioned above, that will probably changed:
04-xfsprogs-pptr-xfs_io-getparentpaths_by_handle.c
Add the XFS_GEOM for xfs_repair/xfs_info:
05-xfsprogs-pptr-add-to-xfs_info.patch
--Mark
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 26+ messages in thread
* [RFC 01/17] xfs: (parent ptr) get offset when adding directory name
2014-01-15 22:00 [RFC 00/17] RFC parent inode pointers Mark Tinguely
@ 2014-01-15 22:00 ` Mark Tinguely
2014-01-15 22:00 ` [RFC 02/17] xfs: (parent ptr) get offset when removing " Mark Tinguely
` (16 subsequent siblings)
17 siblings, 0 replies; 26+ messages in thread
From: Mark Tinguely @ 2014-01-15 22:00 UTC (permalink / raw)
To: xfs
[-- Attachment #1: 01-pptr-retrieve-info-from-directory.patch --]
[-- Type: text/plain, Size: 6437 bytes --]
Return the directory offset information when adding an entry to the
directory.
This offset will be used as the parent pointer offset in xfs_create,
xfs_symlink, xfs_link and xfs_rename.
---
fs/xfs/xfs_da_btree.h | 1 +
fs/xfs/xfs_dir2.c | 6 +++++-
fs/xfs/xfs_dir2.h | 3 ++-
fs/xfs/xfs_dir2_block.c | 1 +
fs/xfs/xfs_dir2_leaf.c | 2 ++
fs/xfs/xfs_dir2_node.c | 2 ++
fs/xfs/xfs_dir2_sf.c | 2 ++
fs/xfs/xfs_inode.c | 7 ++++---
fs/xfs/xfs_symlink.c | 2 +-
9 files changed, 20 insertions(+), 6 deletions(-)
Index: b/fs/xfs/xfs_da_btree.h
===================================================================
--- a/fs/xfs/xfs_da_btree.h
+++ b/fs/xfs/xfs_da_btree.h
@@ -66,6 +66,7 @@ typedef struct xfs_da_args {
int rmtblkcnt2; /* remote attr value block count */
int op_flags; /* operation flags */
enum xfs_dacmp cmpresult; /* name compare result for lookups */
+ __uint32_t offset; /* OUT: offset in directory */
} xfs_da_args_t;
/*
Index: b/fs/xfs/xfs_dir2.c
===================================================================
--- a/fs/xfs/xfs_dir2.c
+++ b/fs/xfs/xfs_dir2.c
@@ -203,7 +203,8 @@ xfs_dir_createname(
xfs_ino_t inum, /* new entry inode number */
xfs_fsblock_t *first, /* bmap's firstblock */
xfs_bmap_free_t *flist, /* bmap's freeblock list */
- xfs_extlen_t total) /* bmap's total block count */
+ xfs_extlen_t total, /* bmap's total block count */
+ __uint32_t *offset) /* OUT entry's dir offset */
{
xfs_da_args_t args;
int rval;
@@ -240,6 +241,9 @@ xfs_dir_createname(
rval = xfs_dir2_leaf_addname(&args);
else
rval = xfs_dir2_node_addname(&args);
+ /* return the location that this entry was place in the parent inode */
+ if (offset)
+ *offset = args.offset;
return rval;
}
Index: b/fs/xfs/xfs_dir2.h
===================================================================
--- a/fs/xfs/xfs_dir2.h
+++ b/fs/xfs/xfs_dir2.h
@@ -119,7 +119,8 @@ extern int xfs_dir_init(struct xfs_trans
extern int xfs_dir_createname(struct xfs_trans *tp, struct xfs_inode *dp,
struct xfs_name *name, xfs_ino_t inum,
xfs_fsblock_t *first,
- struct xfs_bmap_free *flist, xfs_extlen_t tot);
+ struct xfs_bmap_free *flist, xfs_extlen_t tot,
+ __uint32_t *offset);
extern int xfs_dir_lookup(struct xfs_trans *tp, struct xfs_inode *dp,
struct xfs_name *name, xfs_ino_t *inum,
struct xfs_name *ci_name);
Index: b/fs/xfs/xfs_dir2_block.c
===================================================================
--- a/fs/xfs/xfs_dir2_block.c
+++ b/fs/xfs/xfs_dir2_block.c
@@ -554,6 +554,7 @@ xfs_dir2_block_addname(
dp->d_ops->data_put_ftype(dep, args->filetype);
tagp = dp->d_ops->data_entry_tag_p(dep);
*tagp = cpu_to_be16((char *)dep - (char *)hdr);
+ args->offset = xfs_dir2_byte_to_dataptr(mp, (char *)dep - (char *)hdr);
/*
* Clean up the bestfree array and log the header, tail, and entry.
*/
Index: b/fs/xfs/xfs_dir2_leaf.c
===================================================================
--- a/fs/xfs/xfs_dir2_leaf.c
+++ b/fs/xfs/xfs_dir2_leaf.c
@@ -860,6 +860,8 @@ xfs_dir2_leaf_addname(
dp->d_ops->data_put_ftype(dep, args->filetype);
tagp = dp->d_ops->data_entry_tag_p(dep);
*tagp = cpu_to_be16((char *)dep - (char *)hdr);
+ args->offset = xfs_dir2_db_off_to_dataptr(mp, use_block,
+ (char *)dep - (char *)hdr);
/*
* Need to scan fix up the bestfree table.
*/
Index: b/fs/xfs/xfs_dir2_node.c
===================================================================
--- a/fs/xfs/xfs_dir2_node.c
+++ b/fs/xfs/xfs_dir2_node.c
@@ -1973,6 +1973,8 @@ xfs_dir2_node_addname_int(
dp->d_ops->data_put_ftype(dep, args->filetype);
tagp = dp->d_ops->data_entry_tag_p(dep);
*tagp = cpu_to_be16((char *)dep - (char *)hdr);
+ args->offset = xfs_dir2_db_off_to_dataptr(mp, dbno,
+ (char *)dep - (char *)hdr);
xfs_dir2_data_log_entry(tp, dp, dbp, dep);
/*
* Rescan the block for bestfree if needed.
Index: b/fs/xfs/xfs_dir2_sf.c
===================================================================
--- a/fs/xfs/xfs_dir2_sf.c
+++ b/fs/xfs/xfs_dir2_sf.c
@@ -425,6 +425,7 @@ xfs_dir2_sf_addname_easy(
memcpy(sfep->name, args->name, sfep->namelen);
dp->d_ops->sf_put_ino(sfp, sfep, args->inumber);
dp->d_ops->sf_put_ftype(sfep, args->filetype);
+ args->offset = xfs_dir2_byte_to_dataptr(dp->i_mount, offset);
/*
* Update the header and inode.
@@ -520,6 +521,7 @@ xfs_dir2_sf_addname_hard(
memcpy(sfep->name, args->name, sfep->namelen);
dp->d_ops->sf_put_ino(sfp, sfep, args->inumber);
dp->d_ops->sf_put_ftype(sfep, args->filetype);
+ args->offset = xfs_dir2_byte_to_dataptr(dp->i_mount, offset);
sfp->count++;
#if XFS_BIG_INUMS
if (args->inumber > XFS_DIR2_MAX_SHORT_INUM && !objchange)
Index: b/fs/xfs/xfs_inode.c
===================================================================
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -1264,7 +1264,8 @@ xfs_create(
error = xfs_dir_createname(tp, dp, name, ip->i_ino,
&first_block, &free_list, resblks ?
- resblks - XFS_IALLOC_SPACE_RES(mp) : 0);
+ resblks - XFS_IALLOC_SPACE_RES(mp) : 0,
+ NULL);
if (error) {
ASSERT(error != ENOSPC);
goto out_trans_abort;
@@ -1402,7 +1403,7 @@ xfs_link(
xfs_bmap_init(&free_list, &first_block);
error = xfs_dir_createname(tp, tdp, target_name, sip->i_ino,
- &first_block, &free_list, resblks);
+ &first_block, &free_list, resblks, NULL);
if (error)
goto abort_return;
xfs_trans_ichgtime(tp, tdp, XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG);
@@ -2736,7 +2737,7 @@ xfs_rename(
*/
error = xfs_dir_createname(tp, target_dp, target_name,
src_ip->i_ino, &first_block,
- &free_list, spaceres);
+ &free_list, spaceres, NULL);
if (error == ENOSPC)
goto error_return;
if (error)
Index: b/fs/xfs/xfs_symlink.c
===================================================================
--- a/fs/xfs/xfs_symlink.c
+++ b/fs/xfs/xfs_symlink.c
@@ -375,7 +375,7 @@ xfs_symlink(
* Create the directory entry for the symlink.
*/
error = xfs_dir_createname(tp, dp, link_name, ip->i_ino,
- &first_block, &free_list, resblks);
+ &first_block, &free_list, resblks, NULL);
if (error)
goto error2;
xfs_trans_ichgtime(tp, dp, XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG);
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 26+ messages in thread
* [RFC 02/17] xfs: (parent ptr) get offset when removing directory name
2014-01-15 22:00 [RFC 00/17] RFC parent inode pointers Mark Tinguely
2014-01-15 22:00 ` [RFC 01/17] xfs: (parent ptr) get offset when adding directory name Mark Tinguely
@ 2014-01-15 22:00 ` Mark Tinguely
2014-01-15 22:00 ` [RFC 03/17] xfs: (parent ptr) get offset when replacing a " Mark Tinguely
` (15 subsequent siblings)
17 siblings, 0 replies; 26+ messages in thread
From: Mark Tinguely @ 2014-01-15 22:00 UTC (permalink / raw)
To: xfs
[-- Attachment #1: 02-pptr-add-to_xfs_dir_removename.patch --]
[-- Type: text/plain, Size: 4425 bytes --]
Return the directory offset information when removing an entry to the
directory.
This offset will be used as the parent pointer offset in xfs_remove.
---
fs/xfs/xfs_dir2.c | 5 ++++-
fs/xfs/xfs_dir2.h | 3 ++-
fs/xfs/xfs_dir2_block.c | 1 +
fs/xfs/xfs_dir2_leaf.c | 1 +
fs/xfs/xfs_dir2_node.c | 1 +
fs/xfs/xfs_dir2_sf.c | 2 ++
fs/xfs/xfs_inode.c | 4 ++--
7 files changed, 13 insertions(+), 4 deletions(-)
Index: b/fs/xfs/xfs_dir2.c
===================================================================
--- a/fs/xfs/xfs_dir2.c
+++ b/fs/xfs/xfs_dir2.c
@@ -340,7 +340,8 @@ xfs_dir_removename(
xfs_ino_t ino,
xfs_fsblock_t *first, /* bmap's firstblock */
xfs_bmap_free_t *flist, /* bmap's freeblock list */
- xfs_extlen_t total) /* bmap's total block count */
+ xfs_extlen_t total, /* bmap's total block count */
+ __uint32_t *offset) /* out: return entry's dir offset */
{
xfs_da_args_t args;
int rval;
@@ -374,6 +375,8 @@ xfs_dir_removename(
rval = xfs_dir2_leaf_removename(&args);
else
rval = xfs_dir2_node_removename(&args);
+ if (offset)
+ *offset = args.offset;
return rval;
}
Index: b/fs/xfs/xfs_dir2.h
===================================================================
--- a/fs/xfs/xfs_dir2.h
+++ b/fs/xfs/xfs_dir2.h
@@ -127,7 +127,8 @@ extern int xfs_dir_lookup(struct xfs_tra
extern int xfs_dir_removename(struct xfs_trans *tp, struct xfs_inode *dp,
struct xfs_name *name, xfs_ino_t ino,
xfs_fsblock_t *first,
- struct xfs_bmap_free *flist, xfs_extlen_t tot);
+ struct xfs_bmap_free *flist, xfs_extlen_t tot,
+ __uint32_t *offset);
extern int xfs_dir_replace(struct xfs_trans *tp, struct xfs_inode *dp,
struct xfs_name *name, xfs_ino_t inum,
xfs_fsblock_t *first,
Index: b/fs/xfs/xfs_dir2_block.c
===================================================================
--- a/fs/xfs/xfs_dir2_block.c
+++ b/fs/xfs/xfs_dir2_block.c
@@ -796,6 +796,7 @@ xfs_dir2_block_removename(
/*
* Point to the data entry using the leaf entry.
*/
+ args->offset = be32_to_cpu(blp[ent].address);
dep = (xfs_dir2_data_entry_t *)
((char *)hdr + xfs_dir2_dataptr_to_off(mp, be32_to_cpu(blp[ent].address)));
/*
Index: b/fs/xfs/xfs_dir2_leaf.c
===================================================================
--- a/fs/xfs/xfs_dir2_leaf.c
+++ b/fs/xfs/xfs_dir2_leaf.c
@@ -1381,6 +1381,7 @@ xfs_dir2_leaf_removename(
* Point to the leaf entry, use that to point to the data entry.
*/
lep = &ents[index];
+ args->offset = be32_to_cpu(lep->address);
db = xfs_dir2_dataptr_to_db(mp, be32_to_cpu(lep->address));
dep = (xfs_dir2_data_entry_t *)
((char *)hdr + xfs_dir2_dataptr_to_off(mp, be32_to_cpu(lep->address)));
Index: b/fs/xfs/xfs_dir2_node.c
===================================================================
--- a/fs/xfs/xfs_dir2_node.c
+++ b/fs/xfs/xfs_dir2_node.c
@@ -1192,6 +1192,7 @@ xfs_dir2_leafn_remove(
/*
* Extract the data block and offset from the entry.
*/
+ args->offset = be32_to_cpu(lep->address);
db = xfs_dir2_dataptr_to_db(mp, be32_to_cpu(lep->address));
ASSERT(dblk->blkno == db);
off = xfs_dir2_dataptr_to_off(mp, be32_to_cpu(lep->address));
Index: b/fs/xfs/xfs_dir2_sf.c
===================================================================
--- a/fs/xfs/xfs_dir2_sf.c
+++ b/fs/xfs/xfs_dir2_sf.c
@@ -847,6 +847,8 @@ xfs_dir2_sf_removename(
XFS_CMP_EXACT) {
ASSERT(dp->d_ops->sf_get_ino(sfp, sfep) ==
args->inumber);
+ args->offset = xfs_dir2_byte_to_dataptr(dp->i_mount,
+ xfs_dir2_sf_get_offset(sfep));
break;
}
}
Index: b/fs/xfs/xfs_inode.c
===================================================================
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -2542,7 +2542,7 @@ xfs_remove(
xfs_bmap_init(&free_list, &first_block);
error = xfs_dir_removename(tp, dp, name, ip->i_ino,
- &first_block, &free_list, resblks);
+ &first_block, &free_list, resblks, NULL);
if (error) {
ASSERT(error != ENOENT);
goto out_bmap_cancel;
@@ -2847,7 +2847,7 @@ xfs_rename(
}
error = xfs_dir_removename(tp, src_dp, src_name, src_ip->i_ino,
- &first_block, &free_list, spaceres);
+ &first_block, &free_list, spaceres, NULL);
if (error)
goto abort_return;
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 26+ messages in thread
* [RFC 03/17] xfs: (parent ptr) get offset when replacing a directory name
2014-01-15 22:00 [RFC 00/17] RFC parent inode pointers Mark Tinguely
2014-01-15 22:00 ` [RFC 01/17] xfs: (parent ptr) get offset when adding directory name Mark Tinguely
2014-01-15 22:00 ` [RFC 02/17] xfs: (parent ptr) get offset when removing " Mark Tinguely
@ 2014-01-15 22:00 ` Mark Tinguely
2014-01-15 22:00 ` [RFC 04/17] xfs: (parent ptr) add parent pointer support to xfs_sb.h Mark Tinguely
` (14 subsequent siblings)
17 siblings, 0 replies; 26+ messages in thread
From: Mark Tinguely @ 2014-01-15 22:00 UTC (permalink / raw)
To: xfs
[-- Attachment #1: 03-pptr-add-to_xfs_dir_replace.patch --]
[-- Type: text/plain, Size: 4576 bytes --]
Return the directory offset information when replacing an entry to the
directory.
This offset will be used as the parent pointer offset in xfs_rename.
---
fs/xfs/xfs_dir2.c | 5 ++++-
fs/xfs/xfs_dir2.h | 3 ++-
fs/xfs/xfs_dir2_block.c | 1 +
fs/xfs/xfs_dir2_leaf.c | 1 +
fs/xfs/xfs_dir2_node.c | 1 +
fs/xfs/xfs_dir2_sf.c | 3 +++
fs/xfs/xfs_inode.c | 8 ++++----
7 files changed, 16 insertions(+), 6 deletions(-)
Index: b/fs/xfs/xfs_dir2.c
===================================================================
--- a/fs/xfs/xfs_dir2.c
+++ b/fs/xfs/xfs_dir2.c
@@ -391,7 +391,8 @@ xfs_dir_replace(
xfs_ino_t inum, /* new inode number */
xfs_fsblock_t *first, /* bmap's firstblock */
xfs_bmap_free_t *flist, /* bmap's freeblock list */
- xfs_extlen_t total) /* bmap's total block count */
+ xfs_extlen_t total, /* bmap's total block count */
+ __uint32_t *offset) /* out: return entry's dir offset */
{
xfs_da_args_t args;
int rval;
@@ -427,6 +428,8 @@ xfs_dir_replace(
rval = xfs_dir2_leaf_replace(&args);
else
rval = xfs_dir2_node_replace(&args);
+ if (offset)
+ *offset = args.offset;
return rval;
}
Index: b/fs/xfs/xfs_dir2.h
===================================================================
--- a/fs/xfs/xfs_dir2.h
+++ b/fs/xfs/xfs_dir2.h
@@ -132,7 +132,8 @@ extern int xfs_dir_removename(struct xfs
extern int xfs_dir_replace(struct xfs_trans *tp, struct xfs_inode *dp,
struct xfs_name *name, xfs_ino_t inum,
xfs_fsblock_t *first,
- struct xfs_bmap_free *flist, xfs_extlen_t tot);
+ struct xfs_bmap_free *flist, xfs_extlen_t tot,
+ __uint32_t *offset);
extern int xfs_dir_canenter(struct xfs_trans *tp, struct xfs_inode *dp,
struct xfs_name *name, uint resblks);
Index: b/fs/xfs/xfs_dir2_block.c
===================================================================
--- a/fs/xfs/xfs_dir2_block.c
+++ b/fs/xfs/xfs_dir2_block.c
@@ -872,6 +872,7 @@ xfs_dir2_block_replace(
/*
* Point to the data entry we need to change.
*/
+ args->offset = be32_to_cpu(blp[ent].address);
dep = (xfs_dir2_data_entry_t *)
((char *)hdr + xfs_dir2_dataptr_to_off(mp, be32_to_cpu(blp[ent].address)));
ASSERT(be64_to_cpu(dep->inumber) != args->inumber);
Index: b/fs/xfs/xfs_dir2_leaf.c
===================================================================
--- a/fs/xfs/xfs_dir2_leaf.c
+++ b/fs/xfs/xfs_dir2_leaf.c
@@ -1515,6 +1515,7 @@ xfs_dir2_leaf_replace(
/*
* Point to the data entry.
*/
+ args->offset = be32_to_cpu(lep->address);
dep = (xfs_dir2_data_entry_t *)
((char *)dbp->b_addr +
xfs_dir2_dataptr_to_off(dp->i_mount, be32_to_cpu(lep->address)));
Index: b/fs/xfs/xfs_dir2_node.c
===================================================================
--- a/fs/xfs/xfs_dir2_node.c
+++ b/fs/xfs/xfs_dir2_node.c
@@ -2186,6 +2186,7 @@ xfs_dir2_node_replace(
hdr = state->extrablk.bp->b_addr;
ASSERT(hdr->magic == cpu_to_be32(XFS_DIR2_DATA_MAGIC) ||
hdr->magic == cpu_to_be32(XFS_DIR3_DATA_MAGIC));
+ args->offset = be32_to_cpu(lep->address);
dep = (xfs_dir2_data_entry_t *)
((char *)hdr +
xfs_dir2_dataptr_to_off(state->mp, be32_to_cpu(lep->address)));
Index: b/fs/xfs/xfs_dir2_sf.c
===================================================================
--- a/fs/xfs/xfs_dir2_sf.c
+++ b/fs/xfs/xfs_dir2_sf.c
@@ -987,6 +987,9 @@ xfs_dir2_sf_replace(
#endif
dp->d_ops->sf_put_ino(sfp, sfep, args->inumber);
dp->d_ops->sf_put_ftype(sfep, args->filetype);
+ args->offset = xfs_dir2_byte_to_dataptr(
+ dp->i_mount,
+ xfs_dir2_sf_get_offset(sfep));
break;
}
}
Index: b/fs/xfs/xfs_inode.c
===================================================================
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -2778,8 +2778,8 @@ xfs_rename(
* name at the destination directory, remove it first.
*/
error = xfs_dir_replace(tp, target_dp, target_name,
- src_ip->i_ino,
- &first_block, &free_list, spaceres);
+ src_ip->i_ino, &first_block,
+ &free_list, spaceres, NULL);
if (error)
goto abort_return;
@@ -2813,8 +2813,8 @@ xfs_rename(
* directory.
*/
error = xfs_dir_replace(tp, src_ip, &xfs_name_dotdot,
- target_dp->i_ino,
- &first_block, &free_list, spaceres);
+ target_dp->i_ino, &first_block,
+ &free_list, spaceres, NULL);
ASSERT(error != EEXIST);
if (error)
goto abort_return;
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 26+ messages in thread
* [RFC 04/17] xfs: (parent ptr) add parent pointer support to xfs_sb.h
2014-01-15 22:00 [RFC 00/17] RFC parent inode pointers Mark Tinguely
` (2 preceding siblings ...)
2014-01-15 22:00 ` [RFC 03/17] xfs: (parent ptr) get offset when replacing a " Mark Tinguely
@ 2014-01-15 22:00 ` Mark Tinguely
2014-01-15 22:00 ` [RFC 05/17] xfs: (parent ptr) add parent pointer support to attribute code Mark Tinguely
` (13 subsequent siblings)
17 siblings, 0 replies; 26+ messages in thread
From: Mark Tinguely @ 2014-01-15 22:00 UTC (permalink / raw)
To: xfs
[-- Attachment #1: 04-pptr-add-xfs_sb_version_hasparent.patch --]
[-- Type: text/plain, Size: 1465 bytes --]
Add the parent pointer support to the superblock version 5.
---
fs/xfs/xfs_sb.h | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)
Index: b/fs/xfs/xfs_sb.h
===================================================================
--- a/fs/xfs/xfs_sb.h
+++ b/fs/xfs/xfs_sb.h
@@ -89,6 +89,7 @@ struct xfs_trans;
#define XFS_SB_VERSION2_OKREALFBITS \
(XFS_SB_VERSION2_LAZYSBCOUNTBIT | \
XFS_SB_VERSION2_ATTR2BIT | \
+ XFS_SB_VERSION2_PARENTBIT | \
XFS_SB_VERSION2_PROJID32BIT | \
XFS_SB_VERSION2_FTYPE)
#define XFS_SB_VERSION2_OKSASHFBITS \
@@ -596,8 +597,10 @@ xfs_sb_has_ro_compat_feature(
}
#define XFS_SB_FEAT_INCOMPAT_FTYPE (1 << 0) /* filetype in dirent */
+#define XFS_SB_FEAT_INCOMPAT_PARENT (2 << 0) /* parent inode ptr */
#define XFS_SB_FEAT_INCOMPAT_ALL \
- (XFS_SB_FEAT_INCOMPAT_FTYPE)
+ (XFS_SB_FEAT_INCOMPAT_FTYPE | \
+ XFS_SB_FEAT_INCOMPAT_PARENT)
#define XFS_SB_FEAT_INCOMPAT_UNKNOWN ~XFS_SB_FEAT_INCOMPAT_ALL
static inline bool
@@ -639,6 +642,12 @@ static inline int xfs_sb_version_hasftyp
(sbp->sb_features2 & XFS_SB_VERSION2_FTYPE));
}
+static inline int xfs_sb_version_hasparent(struct xfs_sb *sbp)
+{
+ return (XFS_SB_VERSION_NUM(sbp) == XFS_SB_VERSION_5 &&
+ xfs_sb_has_incompat_feature(sbp, XFS_SB_FEAT_INCOMPAT_PARENT));
+}
+
/*
* end of superblock version macros
*/
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 26+ messages in thread
* [RFC 05/17] xfs: (parent ptr) add parent pointer support to attribute code
2014-01-15 22:00 [RFC 00/17] RFC parent inode pointers Mark Tinguely
` (3 preceding siblings ...)
2014-01-15 22:00 ` [RFC 04/17] xfs: (parent ptr) add parent pointer support to xfs_sb.h Mark Tinguely
@ 2014-01-15 22:00 ` Mark Tinguely
2014-01-15 22:00 ` [RFC 06/17] xfs: (parent ptr) add parent pointer support to inode v5 Mark Tinguely
` (12 subsequent siblings)
17 siblings, 0 replies; 26+ messages in thread
From: Mark Tinguely @ 2014-01-15 22:00 UTC (permalink / raw)
To: xfs
[-- Attachment #1: 05-pptr-add-xfs_parent-files.patch --]
[-- Type: text/plain, Size: 5720 bytes --]
Add the new parent attribute type. XFS_ATTR_PARENT is used only for
parent pointer entries; it uses reserved blocks like XFS_ATTR_ROOT.
---
fs/xfs/xfs_attr.c | 20 ++++++++++++--------
fs/xfs/xfs_attr.h | 2 ++
fs/xfs/xfs_da_format.h | 12 ++++++++----
fs/xfs/xfs_fs.h | 1 +
fs/xfs/xfs_fsops.c | 4 +++-
6 files changed, 65 insertions(+), 13 deletions(-)
Index: b/fs/xfs/xfs_attr.c
===================================================================
--- a/fs/xfs/xfs_attr.c
+++ b/fs/xfs/xfs_attr.c
@@ -234,7 +234,7 @@ xfs_attr_set_int(
int error, err2, committed;
struct xfs_mount *mp = dp->i_mount;
struct xfs_trans_res tres;
- int rsvd = (flags & ATTR_ROOT) != 0;
+ int rsvd = (flags & (ATTR_ROOT | ATTR_PARENT)) != 0;
int local;
/*
@@ -444,18 +444,22 @@ xfs_attr_set(
int flags)
{
int error;
- struct xfs_name xname;
+ struct xfs_name xname, *xnamep;
XFS_STATS_INC(xs_attr_set);
if (XFS_FORCED_SHUTDOWN(dp->i_mount))
return (EIO);
- error = xfs_attr_name_to_xname(&xname, name);
- if (error)
- return error;
-
- return xfs_attr_set_int(dp, &xname, value, valuelen, flags);
+ if ((flags & ATTR_PARENT) == 0) {
+ error = xfs_attr_name_to_xname(&xname, name);
+ if (error)
+ return error;
+ xnamep = &xname;
+ } else
+ xnamep = (struct xfs_name *) name;
+ return xfs_attr_set_int(dp, xnamep, value, valuelen, flags);
+
}
/*
@@ -516,7 +520,7 @@ xfs_attr_remove_int(xfs_inode_t *dp, str
* operation if necessary
*/
- if (flags & ATTR_ROOT)
+ if (flags & (ATTR_ROOT | ATTR_PARENT))
args.trans->t_flags |= XFS_TRANS_RESERVE;
error = xfs_trans_reserve(args.trans, &M_RES(mp)->tr_attrrm,
Index: b/fs/xfs/xfs_attr.h
===================================================================
--- a/fs/xfs/xfs_attr.h
+++ b/fs/xfs/xfs_attr.h
@@ -44,6 +44,7 @@ struct xfs_attr_list_context;
#define ATTR_SECURE 0x0008 /* use attrs in security namespace */
#define ATTR_CREATE 0x0010 /* pure create: fail if attr already exists */
#define ATTR_REPLACE 0x0020 /* pure set: fail if attr does not exist */
+#define ATTR_PARENT 0x0040 /* use attrs in parent namespace */
#define ATTR_KERNOTIME 0x1000 /* [kernel] don't update inode timestamps */
#define ATTR_KERNOVAL 0x2000 /* [kernel] get attr size only, not value */
@@ -55,6 +56,7 @@ struct xfs_attr_list_context;
{ ATTR_SECURE, "SECURE" }, \
{ ATTR_CREATE, "CREATE" }, \
{ ATTR_REPLACE, "REPLACE" }, \
+ { ATTR_PARENT, "PARENT" }, \
{ ATTR_KERNOTIME, "KERNOTIME" }, \
{ ATTR_KERNOVAL, "KERNOVAL" }
Index: b/fs/xfs/xfs_da_format.h
===================================================================
--- a/fs/xfs/xfs_da_format.h
+++ b/fs/xfs/xfs_da_format.h
@@ -895,24 +895,28 @@ struct xfs_attr3_icleaf_hdr {
#define XFS_ATTR_LOCAL_BIT 0 /* attr is stored locally */
#define XFS_ATTR_ROOT_BIT 1 /* limit access to trusted attrs */
#define XFS_ATTR_SECURE_BIT 2 /* limit access to secure attrs */
+#define XFS_ATTR_PARENT_BIT 3 /* parent pointer secure attrs */
#define XFS_ATTR_INCOMPLETE_BIT 7 /* attr in middle of create/delete */
#define XFS_ATTR_LOCAL (1 << XFS_ATTR_LOCAL_BIT)
#define XFS_ATTR_ROOT (1 << XFS_ATTR_ROOT_BIT)
#define XFS_ATTR_SECURE (1 << XFS_ATTR_SECURE_BIT)
+#define XFS_ATTR_PARENT (1 << XFS_ATTR_PARENT_BIT)
#define XFS_ATTR_INCOMPLETE (1 << XFS_ATTR_INCOMPLETE_BIT)
/*
* Conversion macros for converting namespace bits from argument flags
* to ondisk flags.
*/
-#define XFS_ATTR_NSP_ARGS_MASK (ATTR_ROOT | ATTR_SECURE)
-#define XFS_ATTR_NSP_ONDISK_MASK (XFS_ATTR_ROOT | XFS_ATTR_SECURE)
+#define XFS_ATTR_NSP_ARGS_MASK (ATTR_ROOT | ATTR_SECURE | XFS_ATTR_PARENT)
+#define XFS_ATTR_NSP_ONDISK_MASK (XFS_ATTR_ROOT | XFS_ATTR_SECURE | XFS_ATTR_PARENT)
#define XFS_ATTR_NSP_ONDISK(flags) ((flags) & XFS_ATTR_NSP_ONDISK_MASK)
#define XFS_ATTR_NSP_ARGS(flags) ((flags) & XFS_ATTR_NSP_ARGS_MASK)
#define XFS_ATTR_NSP_ARGS_TO_ONDISK(x) (((x) & ATTR_ROOT ? XFS_ATTR_ROOT : 0) |\
- ((x) & ATTR_SECURE ? XFS_ATTR_SECURE : 0))
+ ((x) & ATTR_SECURE ? XFS_ATTR_SECURE : 0) | \
+ ((x) & ATTR_PARENT ? XFS_ATTR_PARENT : 0))
#define XFS_ATTR_NSP_ONDISK_TO_ARGS(x) (((x) & XFS_ATTR_ROOT ? ATTR_ROOT : 0) |\
- ((x) & XFS_ATTR_SECURE ? ATTR_SECURE : 0))
+ ((x) & XFS_ATTR_SECURE ? ATTR_SECURE : 0) | \
+ ((x) & XFS_ATTR_PARENT ? ATTR_PARENT : 0))
/*
* Alignment for namelist and valuelist entries (since they are mixed
Index: b/fs/xfs/xfs_fs.h
===================================================================
--- a/fs/xfs/xfs_fs.h
+++ b/fs/xfs/xfs_fs.h
@@ -238,6 +238,7 @@ typedef struct xfs_fsop_resblks {
#define XFS_FSOP_GEOM_FLAGS_LAZYSB 0x4000 /* lazy superblock counters */
#define XFS_FSOP_GEOM_FLAGS_V5SB 0x8000 /* version 5 superblock */
#define XFS_FSOP_GEOM_FLAGS_FTYPE 0x10000 /* inode directory types */
+#define XFS_FSOP_GEOM_FLAGS_PARENT 0x20000 /* parent inode support */
/*
* Minimum and maximum sizes need for growth checks.
Index: b/fs/xfs/xfs_fsops.c
===================================================================
--- a/fs/xfs/xfs_fsops.c
+++ b/fs/xfs/xfs_fsops.c
@@ -104,7 +104,9 @@ xfs_fs_geometry(
(xfs_sb_version_hascrc(&mp->m_sb) ?
XFS_FSOP_GEOM_FLAGS_V5SB : 0) |
(xfs_sb_version_hasftype(&mp->m_sb) ?
- XFS_FSOP_GEOM_FLAGS_FTYPE : 0);
+ XFS_FSOP_GEOM_FLAGS_FTYPE : 0) |
+ (xfs_sb_version_hasparent(&mp->m_sb) ?
+ XFS_FSOP_GEOM_FLAGS_PARENT : 0);
geo->logsectsize = xfs_sb_version_hassector(&mp->m_sb) ?
mp->m_sb.sb_logsectsize : BBSIZE;
geo->rtsectsize = mp->m_sb.sb_blocksize;
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 26+ messages in thread
* [RFC 06/17] xfs: (parent ptr) add parent pointer support to inode v5
2014-01-15 22:00 [RFC 00/17] RFC parent inode pointers Mark Tinguely
` (4 preceding siblings ...)
2014-01-15 22:00 ` [RFC 05/17] xfs: (parent ptr) add parent pointer support to attribute code Mark Tinguely
@ 2014-01-15 22:00 ` Mark Tinguely
2014-01-15 22:00 ` [RFC 07/17] xfs: (parent ptr) add parent pointer support to xfs_create Mark Tinguely
` (11 subsequent siblings)
17 siblings, 0 replies; 26+ messages in thread
From: Mark Tinguely @ 2014-01-15 22:00 UTC (permalink / raw)
To: xfs
[-- Attachment #1: 06-pptr-add-inode-incore.patch --]
[-- Type: text/plain, Size: 2677 bytes --]
Add the fields to inode v5 to track the parent inode number
and this entry's offset in the parent inode leaving the entries
in disk format.
---
fs/xfs/xfs_dinode.h | 4 +++-
fs/xfs/xfs_inode_buf.c | 6 ++++++
fs/xfs/xfs_log_format.h | 4 +++-
3 files changed, 12 insertions(+), 2 deletions(-)
Index: b/fs/xfs/xfs_dinode.h
===================================================================
--- a/fs/xfs/xfs_dinode.h
+++ b/fs/xfs/xfs_dinode.h
@@ -79,7 +79,9 @@ typedef struct xfs_dinode {
__be64 di_changecount; /* number of attribute changes */
__be64 di_lsn; /* flush sequence */
__be64 di_flags2; /* more random flags */
- __u8 di_pad2[16]; /* more padding for future expansion */
+ __be64 di_parent; /* inode of parent directory */
+ __be32 di_poffset; /* offset into parent directory */
+ __u8 di_pad2[4]; /* more padding for future expansion */
/* fields only written to during inode creation */
xfs_timestamp_t di_crtime; /* time created */
Index: b/fs/xfs/xfs_inode_buf.c
===================================================================
--- a/fs/xfs/xfs_inode_buf.c
+++ b/fs/xfs/xfs_inode_buf.c
@@ -238,6 +238,9 @@ xfs_dinode_from_disk(
to->di_lsn = be64_to_cpu(from->di_lsn);
memcpy(to->di_pad2, from->di_pad2, sizeof(to->di_pad2));
uuid_copy(&to->di_uuid, &from->di_uuid);
+ /* parent pointer information is left in disk order */
+ to->di_parent = from->di_parent;
+ to->di_poffset = from->di_poffset;
}
}
@@ -285,6 +288,9 @@ xfs_dinode_to_disk(
memcpy(to->di_pad2, from->di_pad2, sizeof(to->di_pad2));
uuid_copy(&to->di_uuid, &from->di_uuid);
to->di_flushiter = 0;
+ /* parent pointer information is in disk order */
+ to->di_parent = from->di_parent;
+ to->di_poffset = from->di_poffset;
} else {
to->di_flushiter = cpu_to_be16(from->di_flushiter);
}
Index: b/fs/xfs/xfs_log_format.h
===================================================================
--- a/fs/xfs/xfs_log_format.h
+++ b/fs/xfs/xfs_log_format.h
@@ -399,7 +399,9 @@ typedef struct xfs_icdinode {
__uint64_t di_changecount; /* number of attribute changes */
xfs_lsn_t di_lsn; /* flush sequence */
__uint64_t di_flags2; /* more random flags */
- __uint8_t di_pad2[16]; /* more padding for future expansion */
+ __uint64_t di_parent; /* inode of parent directory */
+ __uint32_t di_poffset; /* offset into parent directory */
+ __uint8_t di_pad2[4]; /* more padding for future expansion */
/* fields only written to during inode creation */
xfs_ictimestamp_t di_crtime; /* time created */
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 26+ messages in thread
* [RFC 07/17] xfs: (parent ptr) add parent pointer support to xfs_create
2014-01-15 22:00 [RFC 00/17] RFC parent inode pointers Mark Tinguely
` (5 preceding siblings ...)
2014-01-15 22:00 ` [RFC 06/17] xfs: (parent ptr) add parent pointer support to inode v5 Mark Tinguely
@ 2014-01-15 22:00 ` Mark Tinguely
2014-01-15 22:00 ` [RFC 08/17] xfs: (parent ptr) add parent pointer support to xfs_symlink Mark Tinguely
` (10 subsequent siblings)
17 siblings, 0 replies; 26+ messages in thread
From: Mark Tinguely @ 2014-01-15 22:00 UTC (permalink / raw)
To: xfs
[-- Attachment #1: 07-pptr-add-to-xfs_create.patch --]
[-- Type: text/plain, Size: 1282 bytes --]
Add the parent inode / entry offset into the parent inode entries
when creating a new file or directory. The first entry will always
go into the inode.
---
fs/xfs/xfs_inode.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
Index: b/fs/xfs/xfs_inode.c
===================================================================
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -1163,6 +1163,7 @@ xfs_create(
struct xfs_dquot *pdqp = NULL;
struct xfs_trans_res tres;
uint resblks;
+ uint offset;
trace_xfs_create(dp, name);
@@ -1265,11 +1266,17 @@ xfs_create(
error = xfs_dir_createname(tp, dp, name, ip->i_ino,
&first_block, &free_list, resblks ?
resblks - XFS_IALLOC_SPACE_RES(mp) : 0,
- NULL);
+ &offset);
if (error) {
ASSERT(error != ENOSPC);
goto out_trans_abort;
}
+
+ if (xfs_sb_version_hasparent(&mp->m_sb)) {
+ /* set the parent pointer and offset to the inode core fields */
+ ip->i_d.di_parent = cpu_to_be64(dp->i_ino);
+ ip->i_d.di_poffset = cpu_to_be32(offset);
+ }
xfs_trans_ichgtime(tp, dp, XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG);
xfs_trans_log_inode(tp, dp, XFS_ILOG_CORE);
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 26+ messages in thread
* [RFC 08/17] xfs: (parent ptr) add parent pointer support to xfs_symlink
2014-01-15 22:00 [RFC 00/17] RFC parent inode pointers Mark Tinguely
` (6 preceding siblings ...)
2014-01-15 22:00 ` [RFC 07/17] xfs: (parent ptr) add parent pointer support to xfs_create Mark Tinguely
@ 2014-01-15 22:00 ` Mark Tinguely
2014-01-15 22:00 ` [RFC 09/17] xfs: (parent ptr) add parent pointer support to xfs_link Mark Tinguely
` (9 subsequent siblings)
17 siblings, 0 replies; 26+ messages in thread
From: Mark Tinguely @ 2014-01-15 22:00 UTC (permalink / raw)
To: xfs
[-- Attachment #1: 08-pptr-add-to-xfs_symlink.patch --]
[-- Type: text/plain, Size: 1473 bytes --]
Add the parent inode / entry offset into the parent inode entries
when creating a new symbolic link. The first entry will always
go into the inode.
---
fs/xfs/xfs_symlink.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
Index: b/fs/xfs/xfs_symlink.c
===================================================================
--- a/fs/xfs/xfs_symlink.c
+++ b/fs/xfs/xfs_symlink.c
@@ -36,6 +36,7 @@
#include "xfs_bmap_util.h"
#include "xfs_error.h"
#include "xfs_quota.h"
+#include "xfs_attr.h"
#include "xfs_trans_space.h"
#include "xfs_trace.h"
#include "xfs_symlink.h"
@@ -192,6 +193,7 @@ xfs_symlink(
struct xfs_dquot *gdqp = NULL;
struct xfs_dquot *pdqp = NULL;
uint resblks;
+ uint offset;
*ipp = NULL;
@@ -375,9 +377,15 @@ xfs_symlink(
* Create the directory entry for the symlink.
*/
error = xfs_dir_createname(tp, dp, link_name, ip->i_ino,
- &first_block, &free_list, resblks, NULL);
+ &first_block, &free_list, resblks, &offset);
if (error)
goto error2;
+
+ if (xfs_sb_version_hasparent(&mp->m_sb)) {
+ /* set the parent pointer and offset to the inode core fields */
+ ip->i_d.di_parent = cpu_to_be64(dp->i_ino);
+ ip->i_d.di_poffset = cpu_to_be32(offset);
+ }
xfs_trans_ichgtime(tp, dp, XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG);
xfs_trans_log_inode(tp, dp, XFS_ILOG_CORE);
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 26+ messages in thread
* [RFC 09/17] xfs: (parent ptr) add parent pointer support to xfs_link
2014-01-15 22:00 [RFC 00/17] RFC parent inode pointers Mark Tinguely
` (7 preceding siblings ...)
2014-01-15 22:00 ` [RFC 08/17] xfs: (parent ptr) add parent pointer support to xfs_symlink Mark Tinguely
@ 2014-01-15 22:00 ` Mark Tinguely
2014-01-15 22:00 ` [RFC 10/17] xfs: (parent ptr) add parent pointer support to xfs_remove Mark Tinguely
` (8 subsequent siblings)
17 siblings, 0 replies; 26+ messages in thread
From: Mark Tinguely @ 2014-01-15 22:00 UTC (permalink / raw)
To: xfs
[-- Attachment #1: 09-pptr-add-to_xfs_link.patch --]
[-- Type: text/plain, Size: 4173 bytes --]
Add the parent inode / offset entry for a new link. If the
parent inode pointer entries are not free, then add the
entry as an extended attribute of ATTR_PARENT type. The
entry uses an unique name composed of the 64 bit inode
and the 32 bit offsets. No extend attribute value is used.
---
fs/xfs/xfs_inode.c | 43 ++++++++++++++++++++++++++++++++++++++++---
fs/xfs/xfs_parent.h | 39 +++++++++++++++++++++++++++++++++++++++
2 files changed, 79 insertions(+), 3 deletions(-)
Index: b/fs/xfs/xfs_inode.c
===================================================================
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -51,6 +51,7 @@
#include "xfs_trans_priv.h"
#include "xfs_log.h"
#include "xfs_bmap_btree.h"
+#include "xfs_parent.h"
kmem_zone_t *xfs_inode_zone;
@@ -1358,6 +1359,7 @@ xfs_link(
int cancel_flags;
int committed;
int resblks;
+ uint offset;
trace_xfs_link(tdp, target_name);
@@ -1407,12 +1409,26 @@ xfs_link(
if (error)
goto error_return;
- xfs_bmap_init(&free_list, &first_block);
+ xfs_bmap_init(&free_list, &first_block);
error = xfs_dir_createname(tp, tdp, target_name, sip->i_ino,
- &first_block, &free_list, resblks, NULL);
+ &first_block, &free_list, resblks, &offset);
if (error)
goto abort_return;
+
+ if (xfs_sb_version_hasparent(&mp->m_sb) &&
+ sip->i_d.di_parent == NULLFSINO) {
+ /*
+ * store the parent information in the inode if the fields
+ * are empty. zero the offset as a flag to not add the
+ * information in an extended attribute.
+ */
+ sip->i_d.di_parent = cpu_to_be64(tdp->i_ino);
+ sip->i_d.di_poffset = cpu_to_be32(offset);
+ xfs_trans_log_inode(tp, sip, XFS_ILOG_CORE);
+ offset = 0;
+ }
+
xfs_trans_ichgtime(tp, tdp, XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG);
xfs_trans_log_inode(tp, tdp, XFS_ILOG_CORE);
@@ -1435,7 +1451,28 @@ xfs_link(
goto abort_return;
}
- return xfs_trans_commit(tp, XFS_TRANS_RELEASE_LOG_RES);
+ error = xfs_trans_commit(tp, XFS_TRANS_RELEASE_LOG_RES);
+ if (error)
+ return error;
+
+ if (xfs_sb_version_hasparent(&mp->m_sb) && offset) {
+ struct xfs_pattr p_entry;
+ struct xfs_name p_name;
+
+ /*
+ * The parent pointer information could not be stored
+ * inode core, and must be stored in an extended attribute.
+ */
+ xfs_parent_pname(tdp->i_ino, offset, &p_entry, &p_name);
+ error = xfs_attr_set(sip, (char *) &p_name, NULL, 0,
+ ATTR_PARENT|ATTR_CREATE);
+ if (error)
+ xfs_notice(mp,
+ "%s: attr %llx/%x failed inode %llx %d\n",
+ __func__, tdp->i_ino, offset, sip->i_ino,
+ error);
+ }
+ return error;
abort_return:
cancel_flags |= XFS_TRANS_ABORT;
Index: b/fs/xfs/xfs_parent.h
===================================================================
--- /dev/null
+++ b/fs/xfs/xfs_parent.h
@@ -0,0 +1,39 @@
+/*
+ * Copyright (c) 2014 SGI
+ * All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#ifndef __XFS_PARENT_H__
+#define __XFS_PARENT_H__
+
+#define XFSREMOVESRC (1 << 0)
+#define XFSREMOVETAR (1 << 1)
+#define XFSADD (1 << 2)
+
+struct xfs_pattr {
+ __be64 p_ino;
+ __be32 p_offset;
+} __attribute__((packed));
+
+static inline void xfs_parent_pname(xfs_ino_t ino, __uint32_t off, struct xfs_pattr *pe, struct xfs_name *pn)
+{
+ pe->p_ino = cpu_to_be64(ino);
+ pe->p_offset = cpu_to_be32(off);
+ pn->name = (char *)pe;
+ pn->len = sizeof(struct xfs_pattr);
+}
+#endif /* __XFS_PARENT_H__ */
+
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 26+ messages in thread
* [RFC 10/17] xfs: (parent ptr) add parent pointer support to xfs_remove
2014-01-15 22:00 [RFC 00/17] RFC parent inode pointers Mark Tinguely
` (8 preceding siblings ...)
2014-01-15 22:00 ` [RFC 09/17] xfs: (parent ptr) add parent pointer support to xfs_link Mark Tinguely
@ 2014-01-15 22:00 ` Mark Tinguely
2014-01-15 22:00 ` [RFC 11/17] xfs: (parent ptr) add parent pointer support to xfs_rename Mark Tinguely
` (7 subsequent siblings)
17 siblings, 0 replies; 26+ messages in thread
From: Mark Tinguely @ 2014-01-15 22:00 UTC (permalink / raw)
To: xfs
[-- Attachment #1: 10-pptr-add-to_xfs_remove.patch --]
[-- Type: text/plain, Size: 3685 bytes --]
Remove the parent inode / offset entry when removing the
link. The entry could be in the inode core or an extended
attribute.
---
fs/xfs/xfs_attr.c | 16 ++++++++++------
fs/xfs/xfs_inode.c | 36 ++++++++++++++++++++++++++++++++----
2 files changed, 42 insertions(+), 10 deletions(-)
Index: b/fs/xfs/xfs_attr.c
===================================================================
--- a/fs/xfs/xfs_attr.c
+++ b/fs/xfs/xfs_attr.c
@@ -466,7 +466,7 @@ xfs_attr_set(
* Generic handler routine to remove a name from an attribute list.
* Transitions attribute list from Btree to shortform as necessary.
*/
-STATIC int
+int
xfs_attr_remove_int(xfs_inode_t *dp, struct xfs_name *name, int flags)
{
xfs_da_args_t args;
@@ -594,16 +594,20 @@ xfs_attr_remove(
int flags)
{
int error;
- struct xfs_name xname;
+ struct xfs_name xname, *xnamep;
XFS_STATS_INC(xs_attr_remove);
if (XFS_FORCED_SHUTDOWN(dp->i_mount))
return (EIO);
- error = xfs_attr_name_to_xname(&xname, name);
- if (error)
- return error;
+ if ((flags & ATTR_PARENT) == 0) {
+ xnamep = &xname;
+ error = xfs_attr_name_to_xname(xnamep, name);
+ if (error)
+ return error;
+ } else
+ xnamep = (struct xfs_name *) name;
xfs_ilock(dp, XFS_ILOCK_SHARED);
if (!xfs_inode_hasattr(dp)) {
@@ -612,7 +616,7 @@ xfs_attr_remove(
}
xfs_iunlock(dp, XFS_ILOCK_SHARED);
- return xfs_attr_remove_int(dp, &xname, flags);
+ return xfs_attr_remove_int(dp, xnamep, flags);
}
Index: b/fs/xfs/xfs_inode.c
===================================================================
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -2493,6 +2493,7 @@ xfs_remove(
int link_zero;
uint resblks;
uint log_count;
+ uint offset;
trace_xfs_remove(dp, name);
@@ -2585,13 +2586,24 @@ xfs_remove(
link_zero = (ip->i_d.di_nlink == 0);
xfs_bmap_init(&free_list, &first_block);
- error = xfs_dir_removename(tp, dp, name, ip->i_ino,
- &first_block, &free_list, resblks, NULL);
+ error = xfs_dir_removename(tp, dp, name, ip->i_ino, &first_block,
+ &free_list, resblks, &offset);
if (error) {
ASSERT(error != ENOENT);
goto out_bmap_cancel;
}
+ if (xfs_sb_version_hasparent(&mp->m_sb) &&
+ atomic_read(&VFS_I(ip)->i_count) > 1 && !link_zero &&
+ ip->i_d.di_parent == cpu_to_be64(dp->i_ino) &&
+ ip->i_d.di_poffset == cpu_to_be32(offset)) {
+ /* Remove the incore parent information */
+ ip->i_d.di_parent = NULLFSINO;
+ ip->i_d.di_poffset = 0;
+ xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
+ offset = 0;
+ }
+
/*
* If this is a synchronous mount, make sure that the
* remove transaction goes to disk before returning to
@@ -2608,6 +2620,22 @@ xfs_remove(
if (error)
goto std_return;
+ if (xfs_sb_version_hasparent(&mp->m_sb) &&
+ atomic_read(&VFS_I(ip)->i_count) > 1 && !link_zero && offset) {
+ struct xfs_pattr p_entry;
+ struct xfs_name p_name;
+
+ xfs_parent_pname(dp->i_ino, offset, &p_entry, &p_name);
+ error = xfs_attr_remove(ip, (char *) &p_name, ATTR_PARENT);
+ if (error) {
+ xfs_notice(mp,
+ "%s: attr %llx/%x failed inode %llx %d\n",
+ __func__, dp->i_ino, offset,
+ ip->i_ino, error);
+ if (error == ENOATTR)
+ error = 0;
+ }
+ }
/*
* If we are using filestreams, kill the stream association.
* If the file is still open it may get a new one but that
@@ -2617,7 +2645,7 @@ xfs_remove(
if (!is_dir && link_zero && xfs_inode_is_filestream(ip))
xfs_filestream_deassociate(ip);
- return 0;
+ return error;
out_bmap_cancel:
xfs_bmap_cancel(&free_list);
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 26+ messages in thread
* [RFC 11/17] xfs: (parent ptr) add parent pointer support to xfs_rename
2014-01-15 22:00 [RFC 00/17] RFC parent inode pointers Mark Tinguely
` (9 preceding siblings ...)
2014-01-15 22:00 ` [RFC 10/17] xfs: (parent ptr) add parent pointer support to xfs_remove Mark Tinguely
@ 2014-01-15 22:00 ` Mark Tinguely
2014-01-15 22:00 ` [RFC 12/17] xfs: (parent ptr) add parent pointer support for user space Mark Tinguely
` (6 subsequent siblings)
17 siblings, 0 replies; 26+ messages in thread
From: Mark Tinguely @ 2014-01-15 22:00 UTC (permalink / raw)
To: xfs
[-- Attachment #1: 11-pptr-add-incore-to-rename.patch --]
[-- Type: text/plain, Size: 9616 bytes --]
Add the parent inode / offset support for xfs_rename. This
could add the new extry to inode core, if the entry is empty
or the old entry was stored there, else store the new entry
in an extended attribute. Removal of the old entry from
an inode core or extended attribute is also done.
---
fs/xfs/Makefile | 1
fs/xfs/xfs_inode.c | 69 +++++++++++++++++++++-
fs/xfs/xfs_parent.c | 158 ++++++++++++++++++++++++++++++++++++++++++++++++++++
fs/xfs/xfs_parent.h | 5 +
4 files changed, 228 insertions(+), 5 deletions(-)
Index: b/fs/xfs/Makefile
===================================================================
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -49,6 +49,7 @@ xfs-y += xfs_aops.o \
xfs_message.o \
xfs_mount.o \
xfs_mru_cache.o \
+ xfs_parent.o \
xfs_super.o \
xfs_symlink.o \
xfs_trans.o \
Index: b/fs/xfs/xfs_inode.c
===================================================================
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -2728,6 +2728,9 @@ xfs_rename(
xfs_inode_t *inodes[4];
int spaceres;
int num_inodes;
+ int need_action = 0;
+ uint src_offset = 0;
+ uint tar_offset = 0;
trace_xfs_rename(src_dp, target_dp, src_name, target_name);
@@ -2808,8 +2811,8 @@ xfs_rename(
* to account for the ".." reference from the new entry.
*/
error = xfs_dir_createname(tp, target_dp, target_name,
- src_ip->i_ino, &first_block,
- &free_list, spaceres, NULL);
+ src_ip->i_ino, &first_block,
+ &free_list, spaceres, &tar_offset);
if (error == ENOSPC)
goto error_return;
if (error)
@@ -2851,7 +2854,7 @@ xfs_rename(
*/
error = xfs_dir_replace(tp, target_dp, target_name,
src_ip->i_ino, &first_block,
- &free_list, spaceres, NULL);
+ &free_list, spaceres, &tar_offset);
if (error)
goto abort_return;
@@ -2919,10 +2922,16 @@ xfs_rename(
}
error = xfs_dir_removename(tp, src_dp, src_name, src_ip->i_ino,
- &first_block, &free_list, spaceres, NULL);
+ &first_block, &free_list, spaceres,
+ &src_offset);
if (error)
goto abort_return;
+ if (xfs_sb_version_hasparent(&mp->m_sb))
+ need_action = xfs_pptr_rename(mp, tp, src_dp, target_dp,
+ src_ip, target_ip, src_offset,
+ tar_offset);
+
xfs_trans_ichgtime(tp, src_dp, XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG);
xfs_trans_log_inode(tp, src_dp, XFS_ILOG_CORE);
if (new_parent)
@@ -2949,7 +2958,57 @@ xfs_rename(
* trans_commit will unlock src_ip, target_ip & decrement
* the vnode references.
*/
- return xfs_trans_commit(tp, XFS_TRANS_RELEASE_LOG_RES);
+ error = xfs_trans_commit(tp, XFS_TRANS_RELEASE_LOG_RES);
+ if (error)
+ return error;
+
+ if (xfs_sb_version_hasparent(&mp->m_sb) && need_action) {
+ struct xfs_pattr p_entry;
+ struct xfs_name p_name;
+
+ if ((need_action & XFSREMOVESRC) /* &&
+ atomic_read(&VFS_I(src_ip)->i_count) > 1 &&
+ src_ip->i_d.di_nlink != 0*/) {
+ xfs_parent_pname(src_dp->i_ino, src_offset, &p_entry,
+ &p_name);
+ error = xfs_attr_remove(src_ip, (char *) &p_name,
+ ATTR_PARENT);
+ if (error)
+ xfs_notice(mp,
+ "%s: rm src attr %llx/%x failed inode %llx %d\n",
+ __func__, src_dp->i_ino, src_offset,
+ src_ip->i_ino, error);
+ }
+
+ if (target_ip && (need_action & XFSREMOVETAR) &&
+ atomic_read(&VFS_I(target_ip)->i_count) > 1 &&
+ target_ip->i_d.di_nlink != 0) {
+ xfs_parent_pname(target_dp->i_ino, tar_offset, &p_entry,
+ &p_name);
+ error = xfs_attr_remove(target_ip, (char *) &p_name,
+ ATTR_PARENT);
+ if (error)
+ xfs_notice(mp,
+ "%s: rm tar attr %llx/%x failed inode %llx %d\n",
+ __func__, target_dp->i_ino,
+ tar_offset, target_ip->i_ino, error);
+ }
+
+ if (need_action & XFSADD) {
+ xfs_parent_pname(target_dp->i_ino, tar_offset, &p_entry,
+ &p_name);
+ error = xfs_attr_set(src_ip, (char *) &p_name, NULL, 0,
+ ATTR_PARENT|ATTR_CREATE);
+ if (error) {
+ xfs_notice(mp,
+ "%s: rm add attr %llx/%x failed inode %llx %d\n",
+ __func__, target_dp->i_ino, tar_offset, src_ip->i_ino,
+ error);
+// goto unlock_return;
+ }
+ }
+ }
+ return error;
abort_return:
cancel_flags |= XFS_TRANS_ABORT;
Index: b/fs/xfs/xfs_parent.c
===================================================================
--- /dev/null
+++ b/fs/xfs/xfs_parent.c
@@ -0,0 +1,158 @@
+/*
+ * Copyright (c) 2014 SGI
+ * All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_log_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_inum.h"
+#include "xfs_sb.h"
+#include "xfs_ag.h"
+#include "xfs_mount.h"
+#include "xfs_inode.h"
+#include "xfs_da_format.h"
+#include "xfs_da_btree.h"
+#include "xfs_dir2.h"
+#include "xfs_attr_sf.h"
+#include "xfs_attr.h"
+#include "xfs_trans_space.h"
+#include "xfs_trans.h"
+#include "xfs_buf_item.h"
+#include "xfs_inode_item.h"
+#include "xfs_ialloc.h"
+#include "xfs_bmap.h"
+#include "xfs_bmap_util.h"
+#include "xfs_error.h"
+#include "xfs_quota.h"
+#include "xfs_dinode.h"
+#include "xfs_filestream.h"
+#include "xfs_cksum.h"
+#include "xfs_dir2.h"
+#include "xfs_dir2_priv.h"
+#include "xfs_ioctl.h"
+#include "xfs_trace.h"
+#include "xfs_icache.h"
+#include "xfs_symlink.h"
+#include "xfs_trans_priv.h"
+#include "xfs_log.h"
+#include "xfs_bmap_btree.h"
+#include "xfs_parent.h"
+
+int
+xfs_pptr_rename(
+ struct xfs_mount *mp,
+ struct xfs_trans *tp,
+ struct xfs_inode *src_dp,
+ struct xfs_inode *target_dp,
+ struct xfs_inode *src_ip,
+ struct xfs_inode *target_ip,
+ uint src_offset,
+ uint tar_offset)
+{
+ int64_t be_src_ino = cpu_to_be64(src_dp->i_ino);
+ __uint64_t be_tar_ino = cpu_to_be64(target_dp->i_ino);
+ __uint32_t be_src_offset = cpu_to_be32(src_offset);
+ __uint32_t be_tar_offset = cpu_to_be32(tar_offset);
+
+ ASSERT(xfs_sb_version_hasparent(&mp->m_sb));
+
+ if (target_ip &&
+ src_ip->i_d.di_parent == be_tar_ino &&
+ src_ip->i_d.di_poffset == be_tar_offset) {
+ /*
+ * If the target exists and is already in the source
+ * inode entry, then just remove the source entry.
+ */
+ return XFSREMOVESRC;
+
+ }
+
+ if (src_ip->i_d.di_parent == be_src_ino &&
+ src_ip->i_d.di_poffset == be_src_offset) {
+ /*
+ * The destination contains the old entry, then replace
+ * it with the new entry. Remove the target name if it
+ * existed.
+ */
+ src_ip->i_d.di_parent = be_tar_ino;
+ src_ip->i_d.di_poffset = be_tar_offset;
+ xfs_trans_log_inode(tp, src_ip, XFS_ILOG_CORE);
+
+ if (target_ip) {
+ if (target_ip != src_ip &&
+ target_ip->i_d.di_parent == be_tar_ino &&
+ target_ip->i_d.di_poffset == be_tar_offset) {
+ target_ip->i_d.di_parent = NULLFSINO;
+ target_ip->i_d.di_poffset = 0;
+ xfs_trans_log_inode(tp, target_ip,
+ XFS_ILOG_CORE);
+ return 0;
+ } else
+ return XFSREMOVETAR;
+ }
+ return 0;
+ }
+
+ if (src_ip->i_d.di_parent == NULLFSINO) {
+ /* place entry in empty inode area */
+ src_ip->i_d.di_parent = be_tar_ino;
+ src_ip->i_d.di_poffset = be_tar_offset;
+ xfs_trans_log_inode(tp, src_ip, XFS_ILOG_CORE);
+
+ /*
+ * we know the old src is in the extend attribute so
+ * remove it. if the target exist, remove it also.
+ */
+ if (target_ip) {
+ if (target_ip != src_ip &&
+ target_ip->i_d.di_parent == be_tar_ino &&
+ target_ip->i_d.di_poffset == be_tar_offset) {
+ target_ip->i_d.di_parent = NULLFSINO;
+ target_ip->i_d.di_poffset = 0;
+ xfs_trans_log_inode(tp, target_ip,
+ XFS_ILOG_CORE);
+ return XFSREMOVESRC;
+ } else
+ return XFSREMOVETAR | XFSREMOVESRC;
+ } else
+ return XFSREMOVESRC;
+ }
+
+ if (src_ip != target_ip) {
+ if (target_ip) {
+ if (target_ip->i_d.di_parent == be_tar_ino &&
+ target_ip->i_d.di_poffset == be_tar_offset) {
+ target_ip->i_d.di_parent = NULLFSINO;
+ target_ip->i_d.di_poffset = 0;
+ xfs_trans_log_inode(tp, target_ip,
+ XFS_ILOG_CORE);
+ return XFSREMOVESRC | XFSADD;
+ } else
+ return XFSREMOVETAR | XFSREMOVESRC | XFSADD;
+ } else
+ return XFSREMOVESRC | XFSADD;
+ }
+
+ /*
+ * The extended attribute already exists in the src_ip.
+ * Remove the src name from the extend attribute
+ */
+ return XFSREMOVESRC;
+}
Index: b/fs/xfs/xfs_parent.h
===================================================================
--- a/fs/xfs/xfs_parent.h
+++ b/fs/xfs/xfs_parent.h
@@ -35,5 +35,10 @@ static inline void xfs_parent_pname(xfs_
pn->name = (char *)pe;
pn->len = sizeof(struct xfs_pattr);
}
+
+int xfs_pptr_rename(struct xfs_mount *mp, struct xfs_trans *tp,
+ struct xfs_inode *src_dp, struct xfs_inode *target_dp,
+ struct xfs_inode *src_ip, struct xfs_inode *target_ip,
+ uint src_offset, uint tar_offset);
#endif /* __XFS_PARENT_H__ */
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 26+ messages in thread
* [RFC 12/17] xfs: (parent ptr) add parent pointer support for user space
2014-01-15 22:00 [RFC 00/17] RFC parent inode pointers Mark Tinguely
` (10 preceding siblings ...)
2014-01-15 22:00 ` [RFC 11/17] xfs: (parent ptr) add parent pointer support to xfs_rename Mark Tinguely
@ 2014-01-15 22:00 ` Mark Tinguely
2014-01-15 22:00 ` [RFC 13/17] xfsprogs: add parent pointer support into Linux 3.10 inode 3 Mark Tinguely
` (5 subsequent siblings)
17 siblings, 0 replies; 26+ messages in thread
From: Mark Tinguely @ 2014-01-15 22:00 UTC (permalink / raw)
To: xfs
[-- Attachment #1: 12-pptr-add-incore-to-getparentpaths_by_handle.patch --]
[-- Type: text/plain, Size: 17487 bytes --]
Add the parent inode / offset support for user space.
This adds two new ioctls to provide the names or paths
for an inode that is pointed to by handle.
---
fs/xfs/xfs_attr.h | 2
fs/xfs/xfs_fs.h | 10 +
fs/xfs/xfs_ioctl.c | 72 ++++++++
fs/xfs/xfs_parent.c | 435 ++++++++++++++++++++++++++++++++++++++++++++++++++++
fs/xfs/xfs_parent.h | 30 +++
5 files changed, 548 insertions(+), 1 deletion(-)
Index: b/fs/xfs/xfs_attr.h
===================================================================
--- a/fs/xfs/xfs_attr.h
+++ b/fs/xfs/xfs_attr.h
@@ -131,6 +131,8 @@ typedef struct xfs_attr_list_context {
int put_value; /* T/F: need value for listent */
put_listent_func_t put_listent; /* list output fmt function */
int index; /* index into output buffer */
+ char *scratch; /* parent private buf */
+ int srtchlen; /* parent priv buf len */
} xfs_attr_list_context_t;
Index: b/fs/xfs/xfs_fs.h
===================================================================
--- a/fs/xfs/xfs_fs.h
+++ b/fs/xfs/xfs_fs.h
@@ -438,6 +438,14 @@ typedef struct xfs_fsop_attrmulti_handle
struct xfs_attr_multiop __user *ops; /* attr_multi data */
} xfs_fsop_attrmulti_handlereq_t;
+typedef struct xfs_fsop_parentlist_handlereq {
+ struct xfs_fsop_handlereq hreq; /* handle interface structure */
+ u32 flags; /* which namespace to use */
+ u32 buflen; /* length of supplied buffer */
+ __s32 __user *ocount; /* output count ptr */
+ void __user *buffer; /* returned names */
+} xfs_fsop_parentlist_handlereq_t;
+
/*
* per machine unique filesystem identifier types.
*/
@@ -553,6 +561,8 @@ typedef struct xfs_swapext
#define XFS_IOC_ATTRMULTI_BY_HANDLE _IOW ('X', 123, struct xfs_fsop_attrmulti_handlereq)
#define XFS_IOC_FSGEOMETRY _IOR ('X', 124, struct xfs_fsop_geom)
#define XFS_IOC_GOINGDOWN _IOR ('X', 125, __uint32_t)
+#define XFS_IOC_GETPARENTS_BY_HANDLE _IOWR('X', 126, struct xfs_fsop_parentlist_handlereq)
+#define XFS_IOC_GETPARENTPATHS_BY_HANDLE _IOWR('X', 127, struct xfs_fsop_parentlist_handlereq)
/* XFS_IOC_GETFSUUID ---------- deprecated 140 */
Index: b/fs/xfs/xfs_ioctl.c
===================================================================
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -42,6 +42,7 @@
#include "xfs_symlink.h"
#include "xfs_dinode.h"
#include "xfs_trans.h"
+#include "xfs_parent.h"
#include <linux/capability.h>
#include <linux/dcache.h>
@@ -632,6 +633,73 @@ xfs_attrmulti_by_handle(
return -error;
}
+STATIC int
+xfs_getparent_by_handle(
+ struct file *parfilp,
+ void __user *arg,
+ unsigned int cmd)
+{
+ int error = -ENOMEM;
+ int count = 0;
+ struct xfs_fsop_parentlist_handlereq pl_hreq;
+ struct dentry *dentry;
+ char *kbuf;
+
+ if (!capable(CAP_SYS_ADMIN))
+ return -XFS_ERROR(EPERM);
+
+ if (copy_from_user(&pl_hreq, arg, sizeof(pl_hreq)))
+ return -XFS_ERROR(EFAULT);
+
+ if (pl_hreq.buflen > XATTR_LIST_MAX)
+ return -XFS_ERROR(EINVAL);
+
+ /*
+ * Reject flags, only allow namespaces.
+ */
+ if (pl_hreq.flags & ~(ATTR_PARENT))
+ return -XFS_ERROR(EINVAL);
+
+ dentry = xfs_handlereq_to_dentry(parfilp, &pl_hreq.hreq);
+ if (IS_ERR(dentry))
+ return PTR_ERR(dentry);
+
+ kbuf = kmem_zalloc_large(pl_hreq.buflen, KM_SLEEP);
+ if (!kbuf)
+ goto out_dput;
+
+ switch (cmd) {
+ case XFS_IOC_GETPARENTS_BY_HANDLE:
+ error = -xfs_parents_attr_list(XFS_I(dentry->d_inode), kbuf,
+ pl_hreq.buflen, &count);
+ break;
+ case XFS_IOC_GETPARENTPATHS_BY_HANDLE:
+ error = -xfs_parentpaths_attr_list(XFS_I(dentry->d_inode),
+ kbuf, pl_hreq.buflen, &count);
+ break;
+ default:
+ error = -ENOTTY;
+ }
+ if (error)
+ goto out_kfree;
+
+ if (copy_to_user(pl_hreq.buffer, kbuf, pl_hreq.buflen)) {
+ error = -XFS_ERROR(EFAULT);
+ goto out_kfree;
+ }
+
+ if (pl_hreq.ocount != NULL) {
+ if (copy_to_user(pl_hreq.ocount, &count, sizeof(count)))
+ error = -XFS_ERROR(EFAULT);
+ }
+
+ out_kfree:
+ kmem_free(kbuf);
+ out_dput:
+ dput(dentry);
+ return error;
+}
+
int
xfs_ioc_space(
struct xfs_inode *ip,
@@ -1675,6 +1743,10 @@ xfs_file_ioctl(
case XFS_IOC_ATTRMULTI_BY_HANDLE:
return xfs_attrmulti_by_handle(filp, arg);
+ case XFS_IOC_GETPARENTS_BY_HANDLE:
+ case XFS_IOC_GETPARENTPATHS_BY_HANDLE:
+ return xfs_getparent_by_handle(filp, arg, cmd);
+
case XFS_IOC_SWAPEXT: {
struct xfs_swapext sxp;
Index: b/fs/xfs/xfs_parent.c
===================================================================
--- a/fs/xfs/xfs_parent.c
+++ b/fs/xfs/xfs_parent.c
@@ -156,3 +156,438 @@ xfs_pptr_rename(
*/
return XFSREMOVESRC;
}
+
+/* the fill function for xfs_readdir */
+static int
+xfs_gpfill(
+ void *gfill,
+ const char *name,
+ int namlen,
+ loff_t offset,
+ u64 ino,
+ unsigned int type)
+{
+ struct xfs_pfillinfo *gfp = (struct xfs_pfillinfo *) gfill;
+
+ if (offset != gfp->gp_off) /*XXX mft remove */
+ return 0;
+
+ namlen = MIN(namlen, gfp->gp_len);
+ strncpy(gfp->gp_path + gfp->gp_len - namlen, name, namlen);
+ gfp->gp_len -= namlen;
+ return 1; /* return a 1 to stop */
+}
+
+/* save name of entry at directory pino/offset to the buffer (path) */
+int
+xfs_getpname(
+ struct xfs_mount *mp, /* IN mount point */
+ xfs_ino_t pino, /* IN starting parent ino */
+ xfs_off_t offset, /* IN starting dir offset */
+ char *path, /* OUT buffer to hold path */
+ int *plen) /* IN/OUT pref MAXPATHLEN+1 */
+{
+ int error = 0;
+ struct xfs_inode *dp = NULL;
+ struct xfs_pfillinfo gfill = {
+ .ctx.actor = xfs_gpfill
+ };
+
+ ASSERT(path != NULL);
+
+ gfill.ctx.pos = offset;
+ gfill.gp_off = offset;
+ gfill.gp_path = path;
+ gfill.gp_path[*plen-1] = '\0';
+ memset(path, 0, *plen);
+ gfill.gp_len = *plen-1;
+
+ error = xfs_iget(mp, NULL, pino, 0, XFS_ILOCK_SHARED, &dp);
+ if (error)
+ return error;
+
+ error = xfs_readdir(dp, &(gfill.ctx), *plen);
+ xfs_iunlock(dp, XFS_ILOCK_SHARED);
+ IRELE(dp);
+ *plen = *plen - 1 - gfill.gp_len;
+ return error;
+}
+
+STATIC int
+xfs_attr_paths_put_listent(
+ struct xfs_attr_list_context *context,
+ int flags,
+ unsigned char *name,
+ int namelen,
+ int valuelen,
+ unsigned char *value)
+{
+ struct xfs_pattr *p_attr;
+ struct xfs_parent *p_entry;
+
+ ASSERT(!(context->flags & ATTR_KERNOVAL));
+ ASSERT(context->count >= 0);
+ ASSERT(context->count < (ATTR_MAX_VALUELEN/8));
+ ASSERT(namelen == sizeof(struct xfs_pattr));
+
+ /*
+ * Only list entries in the right namespace.
+ */
+ if ((context->flags & ATTR_PARENT) == 0)
+ return 0;
+
+ if (namelen != sizeof(struct xfs_pattr))
+ return 0;
+
+ p_attr = (struct xfs_pattr *) name;
+
+ p_entry = (struct xfs_parent *)context->alist;
+ p_entry->p_ino = be64_to_cpu(p_attr->p_ino);
+ p_entry->p_offset = be32_to_cpu(p_attr->p_offset);
+ context->seen_enough = 1;
+ return 0;
+}
+
+/* save path of entry at directory pino/offset to the buffer (path) */
+int
+xfs_getpath(
+ struct xfs_mount *mp, /* IN mount point */
+ xfs_ino_t pino, /* IN starting parent ino */
+ xfs_off_t offset, /* IN starting dir offset */
+ char *path, /* OUT buffer to hold path */
+ int *plen) /* IN/OUT pref MAXPATHLEN+1 */
+{
+ int error = 0;
+ uint32_t dotdot;
+ struct xfs_inode *dp = NULL;
+ struct xfs_pfillinfo gfill = {
+ .ctx.actor = xfs_gpfill
+ };
+
+ ASSERT(path != NULL);
+
+ gfill.ctx.pos = offset;
+ gfill.gp_off = offset;
+ gfill.gp_path = path;
+ gfill.gp_path[*plen-1] = '\0';
+ memset(path, 0, *plen);
+ gfill.gp_len = *plen-1;
+
+
+ error = xfs_iget(mp, NULL, pino, 0, XFS_ILOCK_SHARED, &dp);
+ if (error)
+ return error;
+
+ dotdot = dp->d_ops->data_dotdot_offset >> XFS_DIR2_DATA_ALIGN_LOG;
+
+ /*
+ * start with the specified parent inode/offset. save the current
+ * name and then loop through the parent directories filling in
+ * the current directory name until the mount point is reached or
+ * the buffer is full.
+ */
+
+ while (gfill.gp_len > 1 ) {
+ gfill.gp_off = offset; /*XXX mft remove */
+ gfill.ctx.pos = offset;
+ error = xfs_readdir(dp, &(gfill.ctx), *plen);
+ if (error)
+ goto out_irelse;
+
+ /* look for the parent attribute in the current directory */
+ pino = be64_to_cpu(dp->i_d.di_parent);
+ offset = be32_to_cpu(dp->i_d.di_poffset);
+
+ /* stop at root directory */
+ if (pino == mp->m_sb.sb_rootino && offset <= dotdot)
+ break;
+
+ xfs_iunlock(dp, XFS_ILOCK_SHARED);
+ IRELE(dp);
+
+ error = xfs_iget(mp, NULL, pino, 0, XFS_ILOCK_SHARED, &dp);
+ if (error)
+ goto out_irelse;
+
+ if (gfill.gp_len < 1)
+ break;
+
+ /* there is another parent directory - add dir slash */
+ gfill.gp_len--;
+ *(gfill.gp_path + gfill.gp_len) = '/';
+ }
+
+ *plen = *plen - 1 - gfill.gp_len;
+
+ out_irelse:
+ xfs_iunlock(dp, XFS_ILOCK_SHARED);
+ IRELE(dp);
+ return error;
+}
+
+/*
+ * Format an attribute and copy it out to the user's buffer.
+ * Take care to check values and protect against them changing later,
+ * we may be reading them directly out of a user buffer.
+ */
+/*ARGSUSED*/
+STATIC int
+xfs_attr_parents_put_listent(
+ struct xfs_attr_list_context *context,
+ int flags,
+ unsigned char *name,
+ int namelen,
+ int valuelen,
+ unsigned char *value)
+{
+ int error;
+ int len;
+ int reclen;
+ struct xfs_pattr *p_attr;
+ struct xfs_parent *p_entry;
+
+ ASSERT(!(context->flags & ATTR_KERNOVAL));
+ ASSERT(context->count >= 0);
+ ASSERT(context->count < (ATTR_MAX_VALUELEN/8));
+ ASSERT(namelen == sizeof(struct xfs_pattr));
+
+ /*
+ * Only list entries in the right namespace.
+ */
+ if ((context->flags & ATTR_PARENT) == 0)
+ return 0;
+
+ if (namelen != sizeof(struct xfs_pattr))
+ return 0;
+
+ len = context->srtchlen;
+ p_attr = (struct xfs_pattr *) name;
+ error = xfs_getpname(context->dp->i_mount, be64_to_cpu(p_attr->p_ino),
+ be32_to_cpu(p_attr->p_offset), context->scratch, &len);
+ if (error)
+ return 0;
+
+ /* round up to the nearest 8 byte alignment */
+ reclen = PARENT_ENTSIZE(namelen);
+ if ((context->index + reclen) >= context->bufsize) {
+ context->seen_enough = 1;
+ return 0;
+ }
+
+ p_entry = (struct xfs_parent *)(context->alist + context->index);
+ p_entry->p_ino = be64_to_cpu(p_attr->p_ino);
+ p_entry->p_offset = be32_to_cpu(p_attr->p_offset);
+
+ /*
+ * xfs_getpname() will write the name at the end of the buffer
+ * and null terminated. Copy the path and null to xfs_parent_t
+ */
+ memcpy(p_entry->p_name, context->scratch + context->srtchlen -1 - len,
+ len + 1);
+ p_entry->p_reclen = reclen;
+
+ context->index += reclen;
+ context->count++;
+ return 0;
+}
+
+/*
+ * Format an attribute and copy it out to the user's buffer.
+ * Take care to check values and protect against them changing later,
+ * we may be reading them directly out of a user buffer.
+ */
+/*ARGSUSED*/
+STATIC int
+xfs_attr_parentpaths_put_listent(
+ struct xfs_attr_list_context *context,
+ int flags,
+ unsigned char *name,
+ int namelen,
+ int valuelen,
+ unsigned char *value)
+{
+ int error;
+ int len;
+ int reclen;
+ struct xfs_pattr *p_attr;
+ struct xfs_parent *p_entry;
+
+ ASSERT(!(context->flags & ATTR_KERNOVAL));
+ ASSERT(context->count >= 0);
+ ASSERT(context->count < (ATTR_MAX_VALUELEN/8));
+ ASSERT(namelen == sizeof(struct xfs_pattr));
+
+ /*
+ * Only list entries in the right namespace.
+ */
+ if ((context->flags & ATTR_PARENT) == 0)
+ return 0;
+
+ if (namelen != sizeof(struct xfs_pattr))
+ return 0;
+
+ len = context->srtchlen;
+ p_attr = (struct xfs_pattr *) name;
+ error = xfs_getpath(context->dp->i_mount, be64_to_cpu(p_attr->p_ino),
+ be32_to_cpu(p_attr->p_offset), context->scratch, &len);
+ if (error)
+ return 0;
+
+ /* round up to the nearest 8 byte alignment */
+ reclen = PARENT_ENTSIZE(len);
+ if ((context->index + reclen) >= context->bufsize) {
+ context->seen_enough = 1;
+ return 0;
+ }
+
+ p_entry = (struct xfs_parent *)(context->alist + context->index);
+ p_entry->p_ino = be64_to_cpu(p_attr->p_ino);
+ p_entry->p_offset = be32_to_cpu(p_attr->p_offset);
+
+ /*
+ * xfs_getpath() will write the path at the end of the buffer
+ * and null terminated. Copy the path and null to xfs_parent_t
+ */
+ memcpy(p_entry->p_name, context->scratch + context->srtchlen - 1 - len,
+ len + 1);
+ p_entry->p_reclen = reclen;
+
+ context->index += reclen;
+ context->count++;
+ return 0;
+}
+
+/*
+ * Generate a list of extended attribute parent directory path names */
+int
+xfs_parents_attr_list(
+ struct xfs_inode *dp,
+ char *buffer,
+ int bufsize,
+ int *count)
+{
+ struct attrlist_cursor_kern cursor;
+ struct xfs_attr_list_context context;
+ int error;
+
+ /*
+ * Check for a properly aligned buffer.
+ */
+ if (((long)buffer) & (sizeof(int)-1))
+ return XFS_ERROR(EFAULT);
+
+ /*
+ * Initialize the output buffer.
+ */
+ memset(&cursor, 0, sizeof(cursor));
+ memset(&context, 0, sizeof(context));
+ context.dp = dp;
+ context.cursor = &cursor;
+ context.resynch = 1;
+ context.flags = ATTR_PARENT;
+ context.alist = buffer;
+ context.bufsize = (bufsize & ~(sizeof(int)-1)); /* align */
+ context.firstu = context.bufsize;
+ context.put_listent = xfs_attr_parents_put_listent;
+ context.scratch = kmalloc(MAXPATHLEN+1, GFP_KERNEL);
+ context.srtchlen = MAXPATHLEN+1;
+ *(context.scratch+MAXPATHLEN) = '\0';
+
+ if (dp->i_d.di_parent != NULLFSINO) {
+ struct xfs_pattr p_attr;
+
+ /* copy the incore entry */
+ p_attr.p_ino = dp->i_d.di_parent;
+ p_attr.p_offset = dp->i_d.di_poffset;
+ xfs_attr_parents_put_listent(&context, ATTR_PARENT,
+ (char *)&p_attr, sizeof(p_attr), 0, NULL);
+ if (context.seen_enough) {
+ *count = context.count;
+ error = ERANGE;
+ goto freeout;
+ }
+ }
+
+ /* copy the parent extended attribute entries */
+ error = xfs_attr_list_int(&context);
+
+ *count = context.count;
+ if (context.seen_enough) {
+ /*
+ * The buffer is full and we return error.
+ * User would need to recall us with a bigger buffer.
+ */
+ error = ERANGE;
+ }
+ freeout:
+ kfree(context.scratch);
+ return XFS_ERROR(error);
+}
+
+/*
+ * Generate a list of extended attribute parent directory path names
+ */
+int
+xfs_parentpaths_attr_list(
+ struct xfs_inode *dp,
+ char *buffer,
+ int bufsize,
+ int *count)
+{
+ struct attrlist_cursor_kern cursor;
+ struct xfs_attr_list_context context;
+ int error;
+
+ /*
+ * Check for a properly aligned buffer.
+ */
+ if (((long)buffer) & (sizeof(int) - 1))
+ return XFS_ERROR(EFAULT);
+
+ /*
+ * Initialize the output buffer.
+ */
+ memset(&cursor, 0, sizeof(cursor));
+ memset(&context, 0, sizeof(context));
+ context.dp = dp;
+ context.cursor = &cursor;
+ context.resynch = 1;
+ context.flags = ATTR_PARENT;
+ context.alist = buffer;
+ context.bufsize = (bufsize & ~(sizeof(int) - 1)); /* align */
+ context.firstu = context.bufsize;
+ context.put_listent = xfs_attr_parentpaths_put_listent;
+ context.scratch = kmalloc(MAXPATHLEN+1, GFP_KERNEL);
+ context.srtchlen = MAXPATHLEN+1;
+ *(context.scratch+MAXPATHLEN) = '\0';
+
+ if (dp->i_d.di_parent != NULLFSINO) {
+ struct xfs_pattr p_attr;
+
+ /* copy the incore entry */
+ p_attr.p_ino = dp->i_d.di_parent;
+ p_attr.p_offset = dp->i_d.di_poffset;
+ xfs_attr_parentpaths_put_listent(&context, ATTR_PARENT,
+ (char *)&p_attr, sizeof(p_attr), 0, NULL);
+ if (context.seen_enough) {
+ *count = context.count;
+ error = ERANGE;
+ goto freeout;
+ }
+ }
+
+ /* copy the parent extended attribute entries */
+ error = xfs_attr_list_int(&context);
+
+ *count = context.count;
+ if (context.seen_enough) {
+ /*
+ * The buffer is full and we return error.
+ * User would need to recall us with a bigger buffer.
+ */
+ error = ERANGE;
+ }
+ freeout:
+ kfree(context.scratch);
+ return XFS_ERROR(error);
+}
Index: b/fs/xfs/xfs_parent.h
===================================================================
--- a/fs/xfs/xfs_parent.h
+++ b/fs/xfs/xfs_parent.h
@@ -28,6 +28,22 @@ struct xfs_pattr {
__be32 p_offset;
} __attribute__((packed));
+/* xfs_readdir fill program private data */
+struct xfs_pfillinfo {
+ struct dir_context ctx; /* must be first entry */
+ int gp_len;
+ int gp_off;
+ char *gp_path;
+};
+
+typedef struct xfs_parent {
+ uint64_t p_ino; /* parent inode number */
+ uint32_t p_offset; /* entry offset in parent inode */
+ uint16_t p_reclen; /* name length */
+ uint16_t p_pad; /* padding for future */
+ char p_name[1]; /* variable length name */
+} xfs_parent_t;
+
static inline void xfs_parent_pname(xfs_ino_t ino, __uint32_t off, struct xfs_pattr *pe, struct xfs_name *pn)
{
pe->p_ino = cpu_to_be64(ino);
@@ -36,9 +52,21 @@ static inline void xfs_parent_pname(xfs_
pn->len = sizeof(struct xfs_pattr);
}
+/* actual bytes used by parent entry */
+#define PARENT_ENTSIZE(namelen) \
+ ((offsetof(struct xfs_parent, p_name) + (namelen) + 1 + \
+ sizeof(uint64_t)-1) & ~(sizeof(uint64_t)-1))
+
+int xfs_getpath(struct xfs_mount *mp, xfs_ino_t pino, xfs_off_t offset,
+ char *path, int *plen);
+int xfs_getpname(struct xfs_mount *mp, xfs_ino_t pino, xfs_off_t offset,
+ char *path, int *plen);
+int xfs_parents_attr_list(struct xfs_inode *dp, char *buffer, int bufsize,
+ int *count);
+int xfs_parentpaths_attr_list(struct xfs_inode *dp, char *buffer, int bufsize,
+ int *count);
int xfs_pptr_rename(struct xfs_mount *mp, struct xfs_trans *tp,
struct xfs_inode *src_dp, struct xfs_inode *target_dp,
struct xfs_inode *src_ip, struct xfs_inode *target_ip,
uint src_offset, uint tar_offset);
#endif /* __XFS_PARENT_H__ */
-
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 26+ messages in thread
* [RFC 13/17] xfsprogs: add parent pointer support into Linux 3.10 inode 3
2014-01-15 22:00 [RFC 00/17] RFC parent inode pointers Mark Tinguely
` (11 preceding siblings ...)
2014-01-15 22:00 ` [RFC 12/17] xfs: (parent ptr) add parent pointer support for user space Mark Tinguely
@ 2014-01-15 22:00 ` Mark Tinguely
2014-01-15 22:00 ` [RFC 14/17] xfsprogs: add parent pointer values to headers and fix repair Mark Tinguely
` (4 subsequent siblings)
17 siblings, 0 replies; 26+ messages in thread
From: Mark Tinguely @ 2014-01-15 22:00 UTC (permalink / raw)
To: xfs
[-- Attachment #1: 01-xfsprogs-add-parent-pointers.patch --]
[-- Type: text/plain, Size: 7508 bytes --]
Add parent pointer support into the inode version 3 (CRC inode).
Make changes to mkfs.xfs to enable parent pointers using the
"-i parent=1" option.
---
include/xfs_dinode.h | 4 +++-
include/xfs_log_format.h | 4 +++-
include/xfs_sb.h | 14 ++++++++++++--
libxfs/xfs_ialloc.c | 12 ++++++++++++
man/man8/mkfs.xfs.8 | 6 ++++++
mkfs/xfs_mkfs.c | 24 +++++++++++++++++++-----
6 files changed, 55 insertions(+), 9 deletions(-)
Index: b/include/xfs_dinode.h
===================================================================
--- a/include/xfs_dinode.h
+++ b/include/xfs_dinode.h
@@ -79,7 +79,9 @@ typedef struct xfs_dinode {
__be64 di_changecount; /* number of attribute changes */
__be64 di_lsn; /* flush sequence */
__be64 di_flags2; /* more random flags */
- __u8 di_pad2[16]; /* more padding for future expansion */
+ __be64 di_parent; /* parent directory inode */
+ __be32 di_poffset; /* offset into parent directory */
+ __u8 di_pad2[4]; /* more padding for future expansion */
/* fields only written to during inode creation */
xfs_timestamp_t di_crtime; /* time created */
Index: b/include/xfs_log_format.h
===================================================================
--- a/include/xfs_log_format.h
+++ b/include/xfs_log_format.h
@@ -567,7 +567,9 @@ typedef struct xfs_icdinode {
__uint64_t di_changecount; /* number of attribute changes */
xfs_lsn_t di_lsn; /* flush sequence */
__uint64_t di_flags2; /* more random flags */
- __uint8_t di_pad2[16]; /* more padding for future expansion */
+ __uint64_t di_parent; /* parent directory inode */
+ __uint32_t di_poffset; /* offset into parent directory */
+ __uint8_t di_pad2[4]; /* more padding for future expansion */
/* fields only written to during inode creation */
xfs_ictimestamp_t di_crtime; /* time created */
Index: b/include/xfs_sb.h
===================================================================
--- a/include/xfs_sb.h
+++ b/include/xfs_sb.h
@@ -90,6 +90,7 @@ struct xfs_trans;
(XFS_SB_VERSION2_LAZYSBCOUNTBIT | \
XFS_SB_VERSION2_ATTR2BIT | \
XFS_SB_VERSION2_PROJID32BIT | \
+ XFS_SB_VERSION2_PARENTBIT | \
XFS_SB_VERSION2_FTYPE)
#define XFS_SB_VERSION2_OKSASHFBITS \
(0)
@@ -328,7 +329,6 @@ typedef enum {
XFS_SB_FEATURES_RO_COMPAT | XFS_SB_FEATURES_INCOMPAT | \
XFS_SB_FEATURES_LOG_INCOMPAT | XFS_SB_PQUOTINO)
-
/*
* Misc. Flags - warning - these will be cleared by xfs_repair unless
* a feature bit is set when the flag is used.
@@ -596,8 +596,10 @@ xfs_sb_has_ro_compat_feature(
}
#define XFS_SB_FEAT_INCOMPAT_FTYPE (1 << 0) /* filetype in dirent */
+#define XFS_SB_FEAT_INCOMPAT_PARENT (2 << 0) /* parent inode ptrs */
#define XFS_SB_FEAT_INCOMPAT_ALL \
- (XFS_SB_FEAT_INCOMPAT_FTYPE)
+ (XFS_SB_FEAT_INCOMPAT_FTYPE | \
+ XFS_SB_FEAT_INCOMPAT_PARENT)
#define XFS_SB_FEAT_INCOMPAT_UNKNOWN ~XFS_SB_FEAT_INCOMPAT_ALL
static inline bool
@@ -639,6 +641,14 @@ static inline int xfs_sb_version_hasftyp
(sbp->sb_features2 & XFS_SB_VERSION2_FTYPE));
}
+static inline int xfs_sb_version_hasparent(struct xfs_sb *sbp)
+{
+ return (XFS_SB_VERSION_NUM(sbp) == XFS_SB_VERSION_5 &&
+ xfs_sb_has_incompat_feature(sbp, XFS_SB_FEAT_INCOMPAT_PARENT)) ||
+ (xfs_sb_version_hasmorebits(sbp) &&
+ (sbp->sb_features2 & XFS_SB_VERSION2_PARENTBIT));
+}
+
/*
* end of superblock version macros
*/
Index: b/libxfs/xfs_ialloc.c
===================================================================
--- a/libxfs/xfs_ialloc.c
+++ b/libxfs/xfs_ialloc.c
@@ -233,6 +233,18 @@ xfs_ialloc_inode_init(
free->di_next_unlinked = cpu_to_be32(NULLAGINO);
if (version == 3) {
+ if (xfs_sb_version_hasparent(&mp->m_sb) &&
+ (mp->m_sb.sb_rootino == NULLFSINO ||
+ mp->m_sb.sb_rootino == ino)) {
+ /*
+ * set the parent pointer information
+ * in the root inode.
+ */
+ free->di_parent = cpu_to_be64(ino);
+ free->di_poffset =
+ cpu_to_be32(xfs_dir3_data_dotdot_offset(mp)
+ >> XFS_DIR2_DATA_ALIGN_LOG);
+ }
free->di_ino = cpu_to_be64(ino);
ino++;
uuid_copy(&free->di_uuid, &mp->m_sb.sb_uuid);
Index: b/man/man8/mkfs.xfs.8
===================================================================
--- a/man/man8/mkfs.xfs.8
+++ b/man/man8/mkfs.xfs.8
@@ -364,6 +364,12 @@ This is used to enable 32bit quota proje
is either 0 or 1, with 1 signifying that 32bit projid are to be enabled.
If the value is omitted, 1 is assumed. (This default changed
in release version 3.2.0.)
+.TP
+.BI parent[= value ]
+This is used to enable parent inode pointers in CRC enabled filesystems. The
+.I value
+is either 0 or 1, with 1 signifying that parent inode pointers are to be enabled.
+If the value is omitted, 0 is assumed.
.RE
.TP
.BI \-l " log_section_options"
Index: b/mkfs/xfs_mkfs.c
===================================================================
--- a/mkfs/xfs_mkfs.c
+++ b/mkfs/xfs_mkfs.c
@@ -109,6 +109,8 @@ char *iopts[] = {
"attr",
#define I_PROJID32BIT 6
"projid32bit",
+#define I_PARENT 7
+ "parent",
NULL
};
@@ -149,7 +151,7 @@ char *nopts[] = {
"version",
#define N_FTYPE 3
"ftype",
- NULL,
+ NULL
};
char *ropts[] = {
@@ -962,6 +964,7 @@ main(
struct fs_topology ft;
int lazy_sb_counters;
int crcs_enabled;
+ int parent;
progname = basename(argv[0]);
setlocale(LC_ALL, "");
@@ -995,6 +998,7 @@ main(
worst_freelist = 0;
lazy_sb_counters = 1;
crcs_enabled = 0;
+ parent = 0;
memset(&fsx, 0, sizeof(fsx));
memset(&xi, 0, sizeof(xi));
@@ -1316,6 +1320,14 @@ main(
illegal(value, "i projid32bit");
projid16bit = c ? 0 : 1;
break;
+ case I_PARENT:
+ if (!value || *value == '\0')
+ value = "0";
+ c = atoi(value);
+ if (c < 0 || c > 1)
+ illegal(value, "i parent");
+ parent = c;
+ break;
default:
unknown('i', value);
}
@@ -2463,12 +2475,14 @@ _("size %s specified for log subvolume i
if (crcs_enabled) {
sbp->sb_features_incompat = XFS_SB_FEAT_INCOMPAT_FTYPE;
dirftype = 1;
- }
+ if (parent)
+ sbp->sb_features_incompat |= XFS_SB_FEAT_INCOMPAT_PARENT;
+ }
if (!qflag || Nflag) {
printf(_(
"meta-data=%-22s isize=%-6d agcount=%lld, agsize=%lld blks\n"
- " =%-22s sectsz=%-5u attr=%u, projid32bit=%u\n"
+ " =%-22s sectsz=%-5u attr=%u, projid32bit=%u, parent=%d\n"
" =%-22s crc=%u\n"
"data =%-22s bsize=%-6u blocks=%llu, imaxpct=%u\n"
" =%-22s sunit=%-6u swidth=%u blks\n"
@@ -2477,7 +2491,7 @@ _("size %s specified for log subvolume i
" =%-22s sectsz=%-5u sunit=%d blks, lazy-count=%d\n"
"realtime =%-22s extsz=%-6d blocks=%lld, rtextents=%lld\n"),
dfile, isize, (long long)agcount, (long long)agsize,
- "", sectorsize, attrversion, !projid16bit,
+ "", sectorsize, attrversion, !projid16bit, parent,
"", crcs_enabled,
"", blocksize, (long long)dblocks, imaxpct,
"", dsunit, dswidth,
@@ -3085,7 +3099,7 @@ usage( void )
sectlog=n|sectsize=num\n\
/* force overwrite */ [-f]\n\
/* inode size */ [-i log=n|perblock=n|size=num,maxpct=n,attr=0|1|2,\n\
- projid32bit=0|1]\n\
+ projid32bit=0|1,parent==0|1]\n\
/* no discard */ [-K]\n\
/* log subvol */ [-l agnum=n,internal,size=num,logdev=xxx,version=n\n\
sunit=value|su=num,sectlog=n|sectsize=num,\n\
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 26+ messages in thread
* [RFC 14/17] xfsprogs: add parent pointer values to headers and fix repair
2014-01-15 22:00 [RFC 00/17] RFC parent inode pointers Mark Tinguely
` (12 preceding siblings ...)
2014-01-15 22:00 ` [RFC 13/17] xfsprogs: add parent pointer support into Linux 3.10 inode 3 Mark Tinguely
@ 2014-01-15 22:00 ` Mark Tinguely
2014-01-15 22:00 ` [RFC 15/17] xfsprogs: add basic parent pointer support to xfs_db Mark Tinguely
` (3 subsequent siblings)
17 siblings, 0 replies; 26+ messages in thread
From: Mark Tinguely @ 2014-01-15 22:00 UTC (permalink / raw)
To: xfs
[-- Attachment #1: 02-xfsprogs-pptr-fix-namecheck.patch --]
[-- Type: text/plain, Size: 5302 bytes --]
Add the parent inode information to the header files.
Make sure that xfs_repair does not think the non-printable
values in parent pointer extended attribute type is an
indication of a corrupted extened attribute entry.
---
include/libxfs.h | 1 +
include/xfs_da_format.h | 12 ++++++++----
libxfs/xfs.h | 1 +
repair/attr_repair.c | 18 +++++++++++-------
4 files changed, 21 insertions(+), 11 deletions(-)
Index: b/include/libxfs.h
===================================================================
--- a/include/libxfs.h
+++ b/include/libxfs.h
@@ -603,6 +603,7 @@ typedef struct xfs_inode {
#define LIBXFS_ATTR_SECURE 0x0008 /* use attrs in security namespace */
#define LIBXFS_ATTR_CREATE 0x0010 /* create, but fail if attr exists */
#define LIBXFS_ATTR_REPLACE 0x0020 /* set, but fail if attr not exists */
+#define LIBXFS_ATTR_PARENT 0x0040 /* parent pointer entry */
/*
* Project quota id helpers (previously projid was 16bit only and using two
Index: b/include/xfs_da_format.h
===================================================================
--- a/include/xfs_da_format.h
+++ b/include/xfs_da_format.h
@@ -1242,24 +1242,28 @@ struct xfs_attr3_icleaf_hdr {
#define XFS_ATTR_LOCAL_BIT 0 /* attr is stored locally */
#define XFS_ATTR_ROOT_BIT 1 /* limit access to trusted attrs */
#define XFS_ATTR_SECURE_BIT 2 /* limit access to secure attrs */
+#define XFS_ATTR_PARENT_BIT 3 /* parent pointer entry */
#define XFS_ATTR_INCOMPLETE_BIT 7 /* attr in middle of create/delete */
#define XFS_ATTR_LOCAL (1 << XFS_ATTR_LOCAL_BIT)
#define XFS_ATTR_ROOT (1 << XFS_ATTR_ROOT_BIT)
#define XFS_ATTR_SECURE (1 << XFS_ATTR_SECURE_BIT)
+#define XFS_ATTR_PARENT (1 << XFS_ATTR_PARENT_BIT)
#define XFS_ATTR_INCOMPLETE (1 << XFS_ATTR_INCOMPLETE_BIT)
/*
* Conversion macros for converting namespace bits from argument flags
* to ondisk flags.
*/
-#define XFS_ATTR_NSP_ARGS_MASK (ATTR_ROOT | ATTR_SECURE)
-#define XFS_ATTR_NSP_ONDISK_MASK (XFS_ATTR_ROOT | XFS_ATTR_SECURE)
+#define XFS_ATTR_NSP_ARGS_MASK (ATTR_ROOT | ATTR_SECURE | ATTR_PARENT)
+#define XFS_ATTR_NSP_ONDISK_MASK (XFS_ATTR_ROOT | XFS_ATTR_SECURE | XFS_ATTR_PARENT)
#define XFS_ATTR_NSP_ONDISK(flags) ((flags) & XFS_ATTR_NSP_ONDISK_MASK)
#define XFS_ATTR_NSP_ARGS(flags) ((flags) & XFS_ATTR_NSP_ARGS_MASK)
#define XFS_ATTR_NSP_ARGS_TO_ONDISK(x) (((x) & ATTR_ROOT ? XFS_ATTR_ROOT : 0) |\
- ((x) & ATTR_SECURE ? XFS_ATTR_SECURE : 0))
+ ((x) & ATTR_SECURE ? XFS_ATTR_SECURE : 0) | \
+ ((x) & ATTR_PARENT ? XFS_ATTR_PARENT : 0))
#define XFS_ATTR_NSP_ONDISK_TO_ARGS(x) (((x) & XFS_ATTR_ROOT ? ATTR_ROOT : 0) |\
- ((x) & XFS_ATTR_SECURE ? ATTR_SECURE : 0))
+ ((x) & XFS_ATTR_SECURE ? ATTR_SECURE : 0) | \
+ ((x) & XFS_ATTR_PARENT ? ATTR_PARENT : 0))
/*
* Alignment for namelist and valuelist entries (since they are mixed
Index: b/libxfs/xfs.h
===================================================================
--- a/libxfs/xfs.h
+++ b/libxfs/xfs.h
@@ -97,6 +97,7 @@ typedef struct xfs_bmalloca {
#define ATTR_SECURE LIBXFS_ATTR_SECURE
#define ATTR_CREATE LIBXFS_ATTR_CREATE
#define ATTR_REPLACE LIBXFS_ATTR_REPLACE
+#define ATTR_PARENT LIBXFS_ATTR_PARENT
#define ATTR_KERNOTIME 0
#define ATTR_KERNOVAL 0
Index: b/repair/attr_repair.c
===================================================================
--- a/repair/attr_repair.c
+++ b/repair/attr_repair.c
@@ -17,6 +17,7 @@
*/
#include <libxfs.h>
+#include <xfs_attr_leaf.h>
#include "globals.h"
#include "err_protos.h"
#include "attr_repair.h"
@@ -880,7 +881,8 @@ process_shortform_attr(
/* namecheck checks for / and null terminated for file names.
* attributes names currently follow the same rules.
*/
- if (namecheck((char *)¤tentry->nameval[0],
+ if (!(currententry->flags & XFS_ATTR_PARENT) &&
+ namecheck((char *)¤tentry->nameval[0],
currententry->namelen)) {
do_warn(
_("entry contains illegal character in shortform attribute name\n"));
@@ -1026,8 +1028,9 @@ process_leaf_attr_local(
xfs_attr_leaf_name_local_t *local;
local = xfs_attr3_leaf_name_local(leaf, i);
- if (local->namelen == 0 || namecheck((char *)&local->nameval[0],
- local->namelen)) {
+ if (local->namelen == 0 ||
+ (!(entry->flags & XFS_ATTR_PARENT) &&
+ namecheck((char *)&local->nameval[0], local->namelen))) {
do_warn(
_("attribute entry %d in attr block %u, inode %" PRIu64 " has bad name (namelen = %d)\n"),
i, da_bno, ino, local->namelen);
@@ -1081,10 +1084,11 @@ process_leaf_attr_remote(
remotep = xfs_attr3_leaf_name_remote(leaf, i);
- if (remotep->namelen == 0 || namecheck((char *)&remotep->name[0],
- remotep->namelen) ||
- be32_to_cpu(entry->hashval) !=
- libxfs_da_hashname((uchar_t *)&remotep->name[0],
+ if (remotep->namelen == 0 ||
+ (!(entry->flags & XFS_ATTR_PARENT) &&
+ namecheck((char *)&remotep->name[0],remotep->namelen)) ||
+ be32_to_cpu(entry->hashval) !=
+ libxfs_da_hashname((uchar_t *)&remotep->name[0],
remotep->namelen) ||
be32_to_cpu(entry->hashval) < last_hashval ||
be32_to_cpu(remotep->valueblk) == 0) {
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 26+ messages in thread
* [RFC 15/17] xfsprogs: add basic parent pointer support to xfs_db
2014-01-15 22:00 [RFC 00/17] RFC parent inode pointers Mark Tinguely
` (13 preceding siblings ...)
2014-01-15 22:00 ` [RFC 14/17] xfsprogs: add parent pointer values to headers and fix repair Mark Tinguely
@ 2014-01-15 22:00 ` Mark Tinguely
2014-01-15 22:00 ` [RFC 16/17] xfsprogs: add parent pointer support to xfs_io Mark Tinguely
` (2 subsequent siblings)
17 siblings, 0 replies; 26+ messages in thread
From: Mark Tinguely @ 2014-01-15 22:00 UTC (permalink / raw)
To: xfs
[-- Attachment #1: 03-xfsprogs-pptr-add-to-xfs_db.patch --]
[-- Type: text/plain, Size: 797 bytes --]
Add basic parent pointer support to xfs_db. Not at all complete.
---
db/inode.c | 2 ++
---
db/inode.c | 2 ++
1 file changed, 2 insertions(+)
Index: b/db/inode.c
===================================================================
--- a/db/inode.c
+++ b/db/inode.c
@@ -175,6 +175,8 @@ const field_t inode_v3_flds[] = {
{ "crtime", FLDT_TIMESTAMP, OI(COFF(crtime)), C1, 0, TYP_NONE },
{ "inumber", FLDT_INO, OI(COFF(ino)), C1, 0, TYP_NONE },
{ "uuid", FLDT_UUID, OI(COFF(uuid)), C1, 0, TYP_NONE },
+ { "parent", FLDT_UINT64D, OI(COFF(parent)), C1, 0, TYP_NONE },
+ { "poffset", FLDT_UINT32X, OI(OFF(poffset)), C1, 0, TYP_NONE },
{ NULL }
};
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 26+ messages in thread
* [RFC 16/17] xfsprogs: add parent pointer support to xfs_io
2014-01-15 22:00 [RFC 00/17] RFC parent inode pointers Mark Tinguely
` (14 preceding siblings ...)
2014-01-15 22:00 ` [RFC 15/17] xfsprogs: add basic parent pointer support to xfs_db Mark Tinguely
@ 2014-01-15 22:00 ` Mark Tinguely
2014-01-15 22:00 ` [RFC 17/17] xfsprogs: add parent GEOM information Mark Tinguely
2014-01-16 5:56 ` [RFC 00/17] RFC parent inode pointers Dave Chinner
17 siblings, 0 replies; 26+ messages in thread
From: Mark Tinguely @ 2014-01-15 22:00 UTC (permalink / raw)
To: xfs
[-- Attachment #1: 04-xfsprogs-pptr-xfs_io-getparentpaths_by_handle.c --]
[-- Type: text/plain, Size: 12816 bytes --]
xfs_io parent -p display parent pointer information for file.
xfs_io parent -c check parent pointer consistency for filesystem.
---
include/handle.h | 6 ++--
include/jdm.h | 14 +++------
include/parent.h | 13 ++++----
include/xfs_fs.h | 11 +++++++
io/parent.c | 56 +++++++++++++++++++++++---------------
libhandle/handle.c | 78 +++++++++++++++++++++++++++++++++++++++++++----------
libhandle/jdm.c | 48 ++++++++++++++++++++++----------
7 files changed, 158 insertions(+), 68 deletions(-)
Index: b/include/handle.h
===================================================================
--- a/include/handle.h
+++ b/include/handle.h
@@ -24,7 +24,7 @@ extern "C" {
struct fsdmidata;
struct attrlist_cursor;
-struct parent;
+struct xfs_parent;
extern int path_to_handle (char *__path, void **__hanp, size_t *__hlen);
extern int path_to_fshandle (char *__path, void **__fshanp, size_t *__fshlen);
@@ -41,10 +41,10 @@ extern int attr_list_by_handle (void *_
size_t __bufsize, int __flags,
struct attrlist_cursor *__cursor);
extern int parents_by_handle(void *__hanp, size_t __hlen,
- struct parent *__buf, size_t __bufsize,
+ struct xfs_parent *__buf, size_t __bufsize,
unsigned int *__count);
extern int parentpaths_by_handle(void *__hanp, size_t __hlen,
- struct parent *__buf, size_t __bufsize,
+ struct xfs_parent *__buf, size_t __bufsize,
unsigned int *__count);
extern int fssetdm_by_handle (void *__hanp, size_t __hlen,
struct fsdmidata *__fsdmi);
Index: b/include/jdm.h
===================================================================
--- a/include/jdm.h
+++ b/include/jdm.h
@@ -24,7 +24,7 @@ typedef void jdm_filehandle_t; /* fileha
struct xfs_bstat;
struct attrlist_cursor;
-struct parent;
+struct xfs_parent;
extern jdm_fshandle_t *
jdm_getfshandle( char *mntpnt);
@@ -62,16 +62,12 @@ jdm_attr_list( jdm_fshandle_t *fshp,
struct attrlist_cursor *cursor);
extern int
-jdm_parents( jdm_fshandle_t *fshp,
- xfs_bstat_t *statp,
- struct parent *bufp, size_t bufsz,
- unsigned int *count);
+jdm_parents( jdm_fshandle_t *fshp, struct xfs_bstat *statp,
+ struct xfs_parent *bufp, size_t bufsz, unsigned int *count);
extern int
-jdm_parentpaths( jdm_fshandle_t *fshp,
- xfs_bstat_t *statp,
- struct parent *bufp, size_t bufsz,
- unsigned int *count);
+jdm_parentpaths( jdm_fshandle_t *fshp, struct xfs_bstat *statp,
+ struct xfs_parent *bufp, size_t bufsz, unsigned int *count);
/* macro for determining the size of a structure member */
#define sizeofmember( t, m ) sizeof( ( ( t * )0 )->m )
Index: b/include/parent.h
===================================================================
--- a/include/parent.h
+++ b/include/parent.h
@@ -18,12 +18,13 @@
#ifndef __PARENT_H__
#define __PARENT_H__
-typedef struct parent {
- __u64 p_ino;
- __u32 p_gen;
- __u16 p_reclen;
- char p_name[1];
-} parent_t;
+struct xfs_parent {
+ uint64_t p_ino; /*parent inode number */
+ uint32_t p_offset; /* entry offset in parent inode */
+ uint16_t p_reclen; /* name length */
+ uint16_t p_spare; /* padding for future */
+ char p_name[1]; /* name */
+};
typedef struct parent_cursor {
__u32 opaque[4]; /* an opaque cookie */
Index: b/include/xfs_fs.h
===================================================================
--- a/include/xfs_fs.h
+++ b/include/xfs_fs.h
@@ -438,6 +438,15 @@ typedef struct xfs_fsop_attrmulti_handle
struct xfs_attr_multiop __user *ops; /* attr_multi data */
} xfs_fsop_attrmulti_handlereq_t;
+
+typedef struct xfs_fsop_parentlist_handlereq {
+ struct xfs_fsop_handlereq hreq; /* handle interface structure */
+ __u32 flags; /* which namespace to use */
+ __u32 buflen; /* length of buffer supplied */
+ __u32 __user *ocount; /* output count ptr */
+ void __user *buffer; /* returned names */
+} xfs_fsop_parentlist_handlereq_t;
+
/*
* per machine unique filesystem identifier types.
*/
@@ -553,6 +562,8 @@ typedef struct xfs_swapext
#define XFS_IOC_ATTRMULTI_BY_HANDLE _IOW ('X', 123, struct xfs_fsop_attrmulti_handlereq)
#define XFS_IOC_FSGEOMETRY _IOR ('X', 124, struct xfs_fsop_geom)
#define XFS_IOC_GOINGDOWN _IOR ('X', 125, __uint32_t)
+#define XFS_IOC_GETPARENTS_BY_HANDLE _IOWR('X', 126, struct xfs_fsop_parentlist_handlereq)
+#define XFS_IOC_GETPARENTPATHS_BY_HANDLE _IOWR('X', 127, struct xfs_fsop_parentlist_handlereq)
/* XFS_IOC_GETFSUUID ---------- deprecated 140 */
Index: b/io/parent.c
===================================================================
--- a/io/parent.c
+++ b/io/parent.c
@@ -39,14 +39,16 @@ static char *mntpt;
* check out a parent entry to see if the values seem valid
*/
static void
-check_parent_entry(xfs_bstat_t *bstatp, parent_t *parent)
+check_parent_entry(
+ struct xfs_bstat *bstatp,
+ struct xfs_parent *parent)
{
int sts;
char fullpath[PATH_MAX];
struct stat statbuf;
char *str;
- sprintf(fullpath, _("%s%s"), mntpt, parent->p_name);
+ sprintf(fullpath, _("%s/%s"), mntpt, parent->p_name);
sts = lstat(fullpath, &statbuf);
if (sts != 0) {
@@ -121,19 +123,23 @@ check_parent_entry(xfs_bstat_t *bstatp,
}
static void
-check_parents(parent_t *parentbuf, size_t *parentbuf_size,
- jdm_fshandle_t *fshandlep, xfs_bstat_t *statp)
+check_parents(
+ struct xfs_parent *parentbuf,
+ size_t *parentbuf_size,
+ jdm_fshandle_t *fshandlep,
+ struct xfs_bstat *statp)
{
- int error, i;
- __u32 count;
- parent_t *entryp;
+ int error, i;
+ __u32 count = 0;
+ struct xfs_parent *entryp;
do {
error = jdm_parentpaths(fshandlep, statp, parentbuf, *parentbuf_size, &count);
if (error == ERANGE) {
*parentbuf_size *= 2;
- parentbuf = (parent_t *)realloc(parentbuf, *parentbuf_size);
+ parentbuf = (struct xfs_parent *)realloc(parentbuf,
+ *parentbuf_size);
} else if (error) {
fprintf(stderr, _("parentpaths failed for ino %llu: %s\n"),
(unsigned long long) statp->bs_ino,
@@ -154,13 +160,18 @@ check_parents(parent_t *parentbuf, size_
entryp = parentbuf;
for (i = 0; i < count; i++) {
check_parent_entry(statp, entryp);
- entryp = (parent_t*) (((char*)entryp) + entryp->p_reclen);
+ entryp = (struct xfs_parent*) (((char*)entryp) +
+ entryp->p_reclen);
}
}
static int
-do_bulkstat(parent_t *parentbuf, size_t *parentbuf_size, xfs_bstat_t *bstatbuf,
- int fsfd, jdm_fshandle_t *fshandlep)
+do_bulkstat(
+ struct xfs_parent *parentbuf,
+ size_t *parentbuf_size,
+ struct xfs_bstat *bstatbuf,
+ int fsfd,
+ jdm_fshandle_t *fshandlep)
{
__s32 buflenout;
__u64 lastino = 0;
@@ -233,7 +244,7 @@ parent_check(void)
{
int fsfd;
jdm_fshandle_t *fshandlep;
- parent_t *parentbuf;
+ struct xfs_parent *parentbuf;
size_t parentbuf_size = PARENTBUF_SZ;
xfs_bstat_t *bstatbuf;
@@ -254,7 +265,7 @@ parent_check(void)
/* allocate buffers */
bstatbuf = (xfs_bstat_t *)calloc(BSTATBUF_SZ, sizeof(xfs_bstat_t));
- parentbuf = (parent_t *)malloc(parentbuf_size);
+ parentbuf = (struct xfs_parent *)malloc(parentbuf_size);
if (!bstatbuf || !parentbuf) {
fprintf(stderr, _("unable to allocate buffers: %s\n"),
strerror(errno));
@@ -276,13 +287,15 @@ parent_check(void)
}
static void
-print_parent_entry(parent_t *parent, int fullpath)
+print_parent_entry(
+ struct xfs_parent *parent,
+ int fullpath)
{
- printf(_("p_ino = %llu\n"), (unsigned long long) parent->p_ino);
- printf(_("p_gen = %u\n"), parent->p_gen);
+ printf(_("p_ino = %llu\n"), (unsigned long long) parent->p_ino);
+ printf(_("p_offset = %u\n"), parent->p_offset);
printf(_("p_reclen = %u\n"), parent->p_reclen);
if (fullpath)
- printf(_("p_name = \"%s%s\"\n"), mntpt, parent->p_name);
+ printf(_("p_name = \"%s/%s\"\n"), mntpt, parent->p_name);
else
printf(_("p_name = \"%s\"\n"), parent->p_name);
}
@@ -295,8 +308,8 @@ parent_list(int fullpath)
int error, i;
int retval = 1;
__u32 count;
- parent_t *entryp;
- parent_t *parentbuf = NULL;
+ struct xfs_parent *entryp;
+ struct xfs_parent *parentbuf = NULL;
char *path = file->name;
int pb_size = PARENTBUF_SZ;
@@ -318,7 +331,7 @@ parent_list(int fullpath)
}
do {
- parentbuf = (parent_t *)realloc(parentbuf, pb_size);
+ parentbuf = (struct xfs_parent *)realloc(parentbuf, pb_size);
if (!parentbuf) {
fprintf(stderr, _("%s: unable to allocate parent buffer: %s\n"),
progname, strerror(errno));
@@ -357,7 +370,8 @@ parent_list(int fullpath)
entryp = parentbuf;
for (i = 0; i < count; i++) {
print_parent_entry(entryp, fullpath);
- entryp = (parent_t*) (((char*)entryp) + entryp->p_reclen);
+ entryp = (struct xfs_parent *) (((char*)entryp) +
+ entryp->p_reclen);
}
retval = 0;
Index: b/libhandle/handle.c
===================================================================
--- a/libhandle/handle.c
+++ b/libhandle/handle.c
@@ -405,27 +405,77 @@ attr_list_by_handle(
int
parents_by_handle(
- void *hanp,
- size_t hlen,
- parent_t *buf,
- size_t bufsiz,
- unsigned int *count)
+ void *hanp,
+ size_t hlen,
+ struct xfs_parent *buf,
+ size_t bufsiz,
+ unsigned int *count)
{
- errno = EOPNOTSUPP;
- return -1;
+ int error, fd;
+ char *path;
+ char buffer[MAXPATHLEN+1];
+ unsigned int buflen = MAXPATHLEN;
+ struct xfs_fsop_parentlist_handlereq plheq;
+
+ if ((fd = handle_to_fsfd(hanp, &path)) < 0)
+ return -1;
+
+ plheq.hreq.fd = 0;
+ plheq.hreq.path = NULL;
+ plheq.hreq.oflags = O_LARGEFILE;
+ plheq.hreq.ihandle = hanp;
+ plheq.hreq.ihandlen = hlen;
+ plheq.hreq.ohandle = buffer;
+ plheq.hreq.ohandlen = &buflen;
+ plheq.flags = 0x0040;
+ plheq.buffer = buf;
+ plheq.buflen = bufsiz;
+ plheq.ocount = count;
+
+ /* prevent needless EINVAL from the kernel */
+ if (plheq.buflen > XATTR_LIST_MAX)
+ plheq.buflen = XATTR_LIST_MAX;
+
+ error = xfsctl(path, fd, XFS_IOC_GETPARENTS_BY_HANDLE, &plheq);
+ return error;
}
int
parentpaths_by_handle(
- void *hanp,
- size_t hlen,
- parent_t *buf,
- size_t bufsiz,
- unsigned int *count)
+ void *hanp,
+ size_t hlen,
+ struct xfs_parent *buf,
+ size_t bufsiz,
+ unsigned int *count)
{
- errno = EOPNOTSUPP;
- return -1;
+ int error, fd;
+ char *path;
+ char buffer[MAXPATHLEN+1];
+ unsigned int buflen = MAXPATHLEN;
+ struct xfs_fsop_parentlist_handlereq plheq;
+
+ if ((fd = handle_to_fsfd(hanp, &path)) < 0)
+ return -1;
+
+ plheq.hreq.fd = 0;
+ plheq.hreq.path = NULL;
+ plheq.hreq.oflags = O_LARGEFILE;
+ plheq.hreq.ihandle = hanp;
+ plheq.hreq.ihandlen = hlen;
+ plheq.hreq.ohandle = buffer;
+ plheq.hreq.ohandlen = &buflen;
+ plheq.flags = 0x0040;
+ plheq.buffer = buf;
+ plheq.buflen = bufsiz;
+ plheq.ocount = count;
+
+ /* prevent needless EINVAL from the kernel */
+ if (plheq.buflen > XATTR_LIST_MAX)
+ plheq.buflen = XATTR_LIST_MAX;
+
+ error = xfsctl(path, fd, XFS_IOC_GETPARENTPATHS_BY_HANDLE, &plheq);
+ return error;
}
int
Index: b/libhandle/jdm.c
===================================================================
--- a/libhandle/jdm.c
+++ b/libhandle/jdm.c
@@ -18,8 +18,8 @@
#include <xfs/xfs.h>
#include <xfs/handle.h>
-#include <xfs/jdm.h>
#include <xfs/parent.h>
+#include <xfs/jdm.h>
/* internal fshandle - typecast to a void for external use */
#define FSHANDLE_SZ 8
@@ -178,21 +178,39 @@ jdm_attr_list( jdm_fshandle_t *fshp,
}
int
-jdm_parents( jdm_fshandle_t *fshp,
- xfs_bstat_t *statp,
- parent_t *bufp, size_t bufsz,
- unsigned int *count)
-{
- errno = EOPNOTSUPP;
- return -1;
+jdm_parents(
+ jdm_fshandle_t *fshp,
+ struct xfs_bstat *statp,
+ struct xfs_parent *bufp,
+ size_t bufsz,
+ unsigned int *count)
+{
+ struct xfs_handle *handle = (struct xfs_handle *) fshp;
+
+ handle->ha_fid.fid_ino = statp->bs_ino;
+ handle->ha_fid.fid_gen = statp->bs_gen;
+ handle->ha_fid.fid_len =
+ sizeof(handle->ha_fid) - sizeof(handle->ha_fid.fid_len);
+
+ return parents_by_handle(handle, sizeof(struct xfs_handle), bufp,
+ bufsz, count);
}
int
-jdm_parentpaths( jdm_fshandle_t *fshp,
- xfs_bstat_t *statp,
- parent_t *bufp, size_t bufsz,
- unsigned int *count)
-{
- errno = EOPNOTSUPP;
- return -1;
+jdm_parentpaths(
+ jdm_fshandle_t *fshp,
+ struct xfs_bstat *statp,
+ struct xfs_parent *bufp,
+ size_t bufsz,
+ unsigned int *count)
+{
+ struct xfs_handle *handle = (struct xfs_handle *) fshp;
+
+ handle->ha_fid.fid_ino = statp->bs_ino;
+ handle->ha_fid.fid_gen = statp->bs_gen;
+ handle->ha_fid.fid_len =
+ sizeof(handle->ha_fid) - sizeof(handle->ha_fid.fid_len);
+
+ return parentpaths_by_handle(handle, sizeof(struct xfs_handle), bufp,
+ bufsz, count);
}
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 26+ messages in thread
* [RFC 17/17] xfsprogs: add parent GEOM information
2014-01-15 22:00 [RFC 00/17] RFC parent inode pointers Mark Tinguely
` (15 preceding siblings ...)
2014-01-15 22:00 ` [RFC 16/17] xfsprogs: add parent pointer support to xfs_io Mark Tinguely
@ 2014-01-15 22:00 ` Mark Tinguely
2014-01-16 5:56 ` [RFC 00/17] RFC parent inode pointers Dave Chinner
17 siblings, 0 replies; 26+ messages in thread
From: Mark Tinguely @ 2014-01-15 22:00 UTC (permalink / raw)
To: xfs
[-- Attachment #1: 05-xfsprogs-pptr-add-to-xfs_info.patch --]
[-- Type: text/plain, Size: 3283 bytes --]
Add the XFS_FSOP_GEOM_FLAGS information for xfs_growfs/xfs_info.
---
growfs/xfs_growfs.c | 14 +++++++++-----
include/xfs_fs.h | 2 +-
2 files changed, 10 insertions(+), 6 deletions(-)
Index: b/growfs/xfs_growfs.c
===================================================================
--- a/growfs/xfs_growfs.c
+++ b/growfs/xfs_growfs.c
@@ -56,7 +56,8 @@ report_info(
int projid32bit,
int crcs_enabled,
int cimode,
- int ftype_enabled)
+ int ftype_enabled,
+ int parentptr)
{
printf(_(
"meta-data=%-22s isize=%-6u agcount=%u, agsize=%u blks\n"
@@ -64,7 +65,7 @@ report_info(
" =%-22s crc=%u\n"
"data =%-22s bsize=%-6u blocks=%llu, imaxpct=%u\n"
" =%-22s sunit=%-6u swidth=%u blks\n"
- "naming =version %-14u bsize=%-6u ascii-ci=%d ftype=%d\n"
+ "naming =version %-14u bsize=%-6u ascii-ci=%d ftype=%d parent=%d\n"
"log =%-22s bsize=%-6u blocks=%u, version=%u\n"
" =%-22s sectsz=%-5u sunit=%u blks, lazy-count=%u\n"
"realtime =%-22s extsz=%-6u blocks=%llu, rtextents=%llu\n"),
@@ -75,7 +76,7 @@ report_info(
"", geo.blocksize, (unsigned long long)geo.datablocks,
geo.imaxpct,
"", geo.sunit, geo.swidth,
- dirversion, geo.dirblocksize, cimode, ftype_enabled,
+ dirversion, geo.dirblocksize, cimode, ftype_enabled, parentptr,
isint ? _("internal") : logname ? logname : _("external"),
geo.blocksize, geo.logblocks, logversion,
"", geo.logsectsize, geo.logsunit / geo.blocksize, lazycount,
@@ -123,6 +124,7 @@ main(int argc, char **argv)
int projid32bit;
int crcs_enabled;
int ftype_enabled = 0;
+ int parent_enabled = 0;
progname = basename(argv[0]);
setlocale(LC_ALL, "");
@@ -245,11 +247,12 @@ main(int argc, char **argv)
projid32bit = geo.flags & XFS_FSOP_GEOM_FLAGS_PROJID32 ? 1 : 0;
crcs_enabled = geo.flags & XFS_FSOP_GEOM_FLAGS_V5SB ? 1 : 0;
ftype_enabled = geo.flags & XFS_FSOP_GEOM_FLAGS_FTYPE ? 1 : 0;
+ parent_enabled = geo.flags & XFS_FSOP_GEOM_FLAGS_PARENT ? 1 : 0;
if (nflag) {
report_info(geo, datadev, isint, logdev, rtdev,
lazycount, dirversion, logversion,
attrversion, projid32bit, crcs_enabled, ci,
- ftype_enabled);
+ ftype_enabled, parent_enabled);
exit(0);
}
@@ -286,7 +289,8 @@ main(int argc, char **argv)
report_info(geo, datadev, isint, logdev, rtdev,
lazycount, dirversion, logversion,
- attrversion, projid32bit, crcs_enabled, ci, ftype_enabled);
+ attrversion, projid32bit, crcs_enabled, ci,
+ ftype_enabled, parent_enabled);
ddsize = xi.dsize;
dlsize = ( xi.logBBsize? xi.logBBsize :
Index: b/include/xfs_fs.h
===================================================================
--- a/include/xfs_fs.h
+++ b/include/xfs_fs.h
@@ -238,7 +238,7 @@ typedef struct xfs_fsop_resblks {
#define XFS_FSOP_GEOM_FLAGS_LAZYSB 0x4000 /* lazy superblock counters */
#define XFS_FSOP_GEOM_FLAGS_V5SB 0x8000 /* version 5 superblock */
#define XFS_FSOP_GEOM_FLAGS_FTYPE 0x10000 /* inode directory types */
-
+#define XFS_FSOP_GEOM_FLAGS_PARENT 0x20000 /* parent inode support */
/*
* Minimum and maximum sizes need for growth checks.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [RFC 00/17] RFC parent inode pointers.
2014-01-15 22:00 [RFC 00/17] RFC parent inode pointers Mark Tinguely
` (16 preceding siblings ...)
2014-01-15 22:00 ` [RFC 17/17] xfsprogs: add parent GEOM information Mark Tinguely
@ 2014-01-16 5:56 ` Dave Chinner
2014-01-17 21:25 ` Mark Tinguely
17 siblings, 1 reply; 26+ messages in thread
From: Dave Chinner @ 2014-01-16 5:56 UTC (permalink / raw)
To: Mark Tinguely; +Cc: xfs
On Wed, Jan 15, 2014 at 04:00:12PM -0600, Mark Tinguely wrote:
> Yeah, yeah, this has gotten buried several times and better get out on
> the list for discussion before it gets buried yet again.
>
> Parent inode support allow XFS to quickly derive a file name and
> path from the mount point. This can aid in directory/path policies
> and can help relocate items during filesystem shrink.
>
> 1) Representation of a parent inode entry:
>
> There are several ways to represent the parent inode entry.
> A 2005 XFS meeting came up with the following ideas:
>
> 1) Storing the parent inode# list inside the inode with a separate field
> separate fork
Too complex, rejected in the first few minutes of discussion.
> 2) Storing the parent inode# list in EA names with null values
> EA: <name=inode#, value=NULL>
Not uniquely identifying the filename of the inode in parent
directory, so can't tell if name that points to inode is current.
Hard links are impossible to handle. never implemented.
> 3) As in (2) but store the 1st parent inode# in a field in the inode
> first-parent-ptr + <name=inode#, value=NULL>
Same problem, didn't handle hard links. Also not enough space in the
inode core to implement.
> 4) As in (2) but store the hardlink names as EA values
> EA: <name=inode#, value=fname>
Same problem as 2, but added complexity in requiring multiple code
paths to keep parent pointers up to date. Never answered the
question of "when we remove the parent in the inode, who is the new
"primary" parent? Never implemented.
> 5) As in (2) but store the EAs on directories as well as leaf files
> EAs on directories.
Tried to solve problems with 4) by adding more information to
directories. Even more complex, had problems with uniquely
identifying hard links, really convoluted transaction contexts and
locking. Never implemented.
> 6) Storing the parent inode# + directory offset in EA names with null values
> EA: <name=inode#diroffset, value=NULL>
Uniquely identifies the directory entry *slot*, but does not
necessarily identify correct filename. Handled hard links if you
ignore the parent identity uniqueness issue, but demonstrated
unscalable name lookup behaviour.
> The preferred method was #6.
As it was, this once looked like the solution. IIRC, this was the
design that was implemented for Irix and shipped in 6.5.28 but was
never used in production by anyone because of problems that became
apparent after it was shipped. It was fundamentally flawed, and
those flaws were uncovered when porting the Irix implementation to
Linux.
Using the directory offset was discovered to be problematic when
there were multiple hardlinks into the same directory for the same
inode. You can remove links and add links, and the only thing that
changes is the diroffset. Hence to do a safe reverse lookup, you
have to first lock the inode to stabilise the EA, then read the
parent pointer to get the parent directory, then get the parent
inode and lock it so that nobody is modifying it as we traverse it
during a offset->name lookup.
But this violated the VFS directory tree locking order, which is
parent->child. Think about a racing unlink that has locked the
parent inode vs a parent pointer lookup that has lock the child and
is trying to lock the parent. Once this problem was understood,
using directory offsets was considered a non-starter for the linux
port given how complex working around the inversion makes the
reverse path lookups.
I note that Mark's patch set will not trip over this,
because the XFS_IOC_GETPARENTS_BY_HANDLE code never actually locks
the child inode the handle points to. IOWs, it isn't safe against
concurrent modification of the inode, and hence would never deadlock
even if the problem was seen. It's more likely to have caused
filesystem corruption or crashed.
Further, not storing the filename in the attribute value has
scalability limitations. A reverse lookup requires reading the
directory entry to get the filename, and so if you have large
directories and/or a large number of reverse lookups to perform the
cost is huge. Every lookup has to walk the bmbt tree to find the
directory data block that contains the offset, then it needs to be
read from disk, and then it needs to be iterated to find the correct
entry. It's a huge amount of work, and it's unnecessary because we
have the name of the file at the time the parent pointer EA is
created...
> 7) (kind of (4) + (6))
> EA: <name=inode#diroffset, value=filename>
This mostly solved the reverse lookup deadlock because it was no
longer necessary to read the parent directory to find the filename.
It didn't solve it entirely, though, because of the parent inode
uniqueness issue. We still have to hold the child locked to
guarantee that when we look up the parent inode number it matches
what is in the EA, and so we still have to lock the path from
child->parent->root in volation of the VFs lock ordering. First
Linux solution considered, never implemented.
8) EA: <name=inode#,generation,diroffset>, value=<filename>
Solved the deadlock problems, the scalability problems, and
uniqueness. Early implementation showed up more flaws with using the
directory offset and multiple hard links into the same directory -
we could get races with unlink/link loads failing parent attribute
creation because there was already an attribute of that name on the
inode. This was because of the fact that inode locks are dropped
during transaction commits between directory and attribute
modifications. Hence we could lose parent pointers silently under
such workloads.
The solution that SGI then settled on was this:
9) EA: <name=inode#,generation,count>, value=<filename>
It solved all the unique identifier problems, and the lookup
scalability issues and all the deadlocks, and that was what was
implemented in the patch posted here in 2009:
http://thread.gmane.org/gmane.comp.file-systems.xfs.general/27772
With the addition of the inode generation number, we can look up the
parent inode without holding the child inode locked and know that we
got the right inode by verifying the inode/gen tuple match on the
inode we looked up. However, we haven't got the child locked, so we
need to have some other method of knowing if the child is still in
the directory once it is locked. (e.g. unlink/link races).
That's where the "count" field of the EA name comes from. Each
inode keeps a separate "parent counter" attribute so that hard links
to the same directory can be uniquely identified by the EA name. The
initial inode create always uses a value of "1", and when the first
hardlink is added the special attribute is created with a value of
2, and that is used (and incremented) on every subsequent hard link.
Hence every hardlink has a unique attribute name/value pair.
As a result, parent lookups (for path resolution or removal) can
match on ino/gen/value, knowing that if we race with an unlink and a
new hardlink of the *same name* there will be a second *unique*
parent pointer attribute (under a ino/gen/cnt+N name) that has the
same pointer information added to the tree for the new entry. Hence
it is safe to use whatever parent pointer attribute for a given
ino/gen/value we find because it is guaranteed to be valid for the
operation we are about to perform.
A potential optimisation to improve it is that parent counter does
not need to be in an EA. It's 32 bits, so can easily be added to the
spare space in the v2 inode and avoid unnecessary attribute reads
and writes on parent pointer creation.
Hence the parent pointer structure and implementation in 9) solves
all the known problems to do with hardlinks, scalability, locking
order, code complexity etc, and it has relatively little additional
runtime overhead. It also works for v4 filesystems - the proposed
patch set cannot be made to work on older filesystem structures as
there isn't room in the v2 inode for the fast-path fields.
So, there are several problems with approach 6) that the proposed
patchset is based on that were later discovered and solved. Mark,
most of this information was given to you (but in less detail) a
year ago. The patchset hasn't solved any of the flaws that were
pointed out back then, and I don't see how they can be with the
design that is being used.
I don't expect anyone to understand all the intricacies of the
problems that the 2009 parent pointer patchset solved. It's simple
code, but the problems it solves are subtle and complex and were
found the hard way - by shipping code for political/commercial
reasons before the entire problem space was understood.
That's the primary reason why SGI undertook to redesign parent
pointers for the Linux code base rather than port the Irix code -
the irix design was fundamentally flawed and a dead end. SGI might
have lost that historical XFS knowledge and perspective, but the
Linux XFS community hasn't.
Hence the problems with various parent pointer designs are still
well understood and, as such, the implementation needs to be done
properly and solve all the known design flaws we've found in the
past. To this day, the only parent pointer design that actually
avoids all the known problems is one we have from SGI from 2009....
> On the other hand keeping the directory and extended attribute in one
> transaction should keeping the changes atomic when the filesystem
> is forced down between the directory and attribute changes. Despite
> all the gore (see below) of doing the directory and attribute changes
> in one transaction, I think it is the correct thing to do.
Yes, I said that it was necessary a year ago and pointed you at how
to do it:
http://xfs.org/index.php/Improving_Metadata_Performance_By_Reducing_Journal_Overhead#Atomic_Multi-Transaction_Operations
Mark, it saddens me that you've wasted your time on a dead end. I wish
you had of spent your time implementing AMTs instead, and then just
forward ported the 2009 patch. If you'd done that a year ago, we'd
be just about ready to ship a working parent pointer implementation
instead having to go back to square one. :(
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [RFC 00/17] RFC parent inode pointers.
2014-01-16 5:56 ` [RFC 00/17] RFC parent inode pointers Dave Chinner
@ 2014-01-17 21:25 ` Mark Tinguely
2014-01-18 3:12 ` Dave Chinner
0 siblings, 1 reply; 26+ messages in thread
From: Mark Tinguely @ 2014-01-17 21:25 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
On 01/15/14 23:56, Dave Chinner wrote:
> On Wed, Jan 15, 2014 at 04:00:12PM -0600, Mark Tinguely wrote:
>> Yeah, yeah, this has gotten buried several times and better get out on
>> the list for discussion before it gets buried yet again.
>>
>> Parent inode support allow XFS to quickly derive a file name and
>> path from the mount point. This can aid in directory/path policies
>> and can help relocate items during filesystem shrink.
>>
>> 1) Representation of a parent inode entry:
>>
>> There are several ways to represent the parent inode entry.
>> A 2005 XFS meeting came up with the following ideas:
>>
>> 1) Storing the parent inode# list inside the inode with a separate field
>> separate fork
>
> Too complex, rejected in the first few minutes of discussion.
>
>> 2) Storing the parent inode# list in EA names with null values
>> EA:<name=inode#, value=NULL>
>
> Not uniquely identifying the filename of the inode in parent
> directory, so can't tell if name that points to inode is current.
> Hard links are impossible to handle. never implemented.
>
>> 3) As in (2) but store the 1st parent inode# in a field in the inode
>> first-parent-ptr +<name=inode#, value=NULL>
>
> Same problem, didn't handle hard links. Also not enough space in the
> inode core to implement.
>
>> 4) As in (2) but store the hardlink names as EA values
>> EA:<name=inode#, value=fname>
>
> Same problem as 2, but added complexity in requiring multiple code
> paths to keep parent pointers up to date. Never answered the
> question of "when we remove the parent in the inode, who is the new
> "primary" parent? Never implemented.
>
>> 5) As in (2) but store the EAs on directories as well as leaf files
>> EAs on directories.
>
> Tried to solve problems with 4) by adding more information to
> directories. Even more complex, had problems with uniquely
> identifying hard links, really convoluted transaction contexts and
> locking. Never implemented.
>
>> 6) Storing the parent inode# + directory offset in EA names with null values
>> EA:<name=inode#diroffset, value=NULL>
>
> Uniquely identifies the directory entry *slot*, but does not
> necessarily identify correct filename. Handled hard links if you
> ignore the parent identity uniqueness issue, but demonstrated
> unscalable name lookup behaviour.
how is this the wrong file name?
How is a directory xfs_readname() in a ioctl unscalable?
Adding file names to a EA, that limits scalability. Having to do a EA
sequential lookup in the remove/rename data paths to find a EA to
remove, that is unscalable.
>
>> The preferred method was #6.
>
> As it was, this once looked like the solution. IIRC, this was the
> design that was implemented for Irix and shipped in 6.5.28 but was
> never used in production by anyone because of problems that became
> apparent after it was shipped. It was fundamentally flawed, and
> those flaws were uncovered when porting the Irix implementation to
> Linux.
>
The Irix 6.5.X does not use this method. It uses stripped down version
of 2009 code. A <inode:generation> combination.
> Using the directory offset was discovered to be problematic when
> there were multiple hardlinks into the same directory for the same
> inode. You can remove links and add links, and the only thing that
> changes is the diroffset. Hence to do a safe reverse lookup, you
> have to first lock the inode to stabilise the EA, then read the
> parent pointer to get the parent directory, then get the parent
> inode and lock it so that nobody is modifying it as we traverse it
> during a offset->name lookup.
um, you are wrong, it is not this version of the code. It uses some
inode/generation combination.
>
> But this violated the VFS directory tree locking order, which is
> parent->child. Think about a racing unlink that has locked the
> parent inode vs a parent pointer lookup that has lock the child and
> is trying to lock the parent. Once this problem was understood,
> using directory offsets was considered a non-starter for the linux
> port given how complex working around the inversion makes the
> reverse path lookups.
>
> I note that Mark's patch set will not trip over this,
> because the XFS_IOC_GETPARENTS_BY_HANDLE code never actually locks
> the child inode the handle points to. IOWs, it isn't safe against
> concurrent modification of the inode, and hence would never deadlock
> even if the problem was seen. It's more likely to have caused
> filesystem corruption or crashed.
It has locked in the attribute iteration code. I see your point on the
locking order and we cannot use a callback to get the name. There are
very simple solutions for that.
>
> Further, not storing the filename in the attribute value has
> scalability limitations. A reverse lookup requires reading the
> directory entry to get the filename, and so if you have large
> directories and/or a large number of reverse lookups to perform the
> cost is huge. Every lookup has to walk the bmbt tree to find the
> directory data block that contains the offset, then it needs to be
> read from disk, and then it needs to be iterated to find the correct
> entry. It's a huge amount of work, and it's unnecessary because we
> have the name of the file at the time the parent pointer EA is
> created...
Happens only during the ioctl. The 2009 does far worse things in the
data path.
>
>> 7) (kind of (4) + (6))
>> EA:<name=inode#diroffset, value=filename>
>
> This mostly solved the reverse lookup deadlock because it was no
> longer necessary to read the parent directory to find the filename.
> It didn't solve it entirely, though, because of the parent inode
> uniqueness issue. We still have to hold the child locked to
> guarantee that when we look up the parent inode number it matches
> what is in the EA, and so we still have to lock the path from
> child->parent->root in volation of the VFs lock ordering. First
> Linux solution considered, never implemented.
>
> 8) EA:<name=inode#,generation,diroffset>, value=<filename>
>
> Solved the deadlock problems, the scalability problems, and
> uniqueness. Early implementation showed up more flaws with using the
> directory offset and multiple hard links into the same directory -
> we could get races with unlink/link loads failing parent attribute
> creation because there was already an attribute of that name on the
> inode. This was because of the fact that inode locks are dropped
> during transaction commits between directory and attribute
> modifications. Hence we could lose parent pointers silently under
> such workloads.
>
> The solution that SGI then settled on was this:
>
> 9) EA:<name=inode#,generation,count>, value=<filename>
>
> It solved all the unique identifier problems, and the lookup
> scalability issues and all the deadlocks, and that was what was
> implemented in the patch posted here in 2009:
>
> http://thread.gmane.org/gmane.comp.file-systems.xfs.general/27772
>
> With the addition of the inode generation number, we can look up the
> parent inode without holding the child inode locked and know that we
> got the right inode by verifying the inode/gen tuple match on the
> inode we looked up. However, we haven't got the child locked, so we
> need to have some other method of knowing if the child is still in
> the directory once it is locked. (e.g. unlink/link races).
The gen number changes only if the inode has been reallocated. You have
a parent inode and a filename, but you would have to do a xfs_lookup to
make sure that the parent is still the parent.
>
> That's where the "count" field of the EA name comes from. Each
> inode keeps a separate "parent counter" attribute so that hard links
> to the same directory can be uniquely identified by the EA name. The
> initial inode create always uses a value of "1", and when the first
> hardlink is added the special attribute is created with a value of
> 2, and that is used (and incremented) on every subsequent hard link.
> Hence every hardlink has a unique attribute name/value pair.
This code has never be proven. It appears to me that this will prevent
one kind of link/unlink race but to prevent the other, you need to hold
the lock over the directory code and the EA code.
>
> As a result, parent lookups (for path resolution or removal) can
> match on ino/gen/value, knowing that if we race with an unlink and a
> new hardlink of the *same name* there will be a second *unique*
> parent pointer attribute (under a ino/gen/cnt+N name) that has the
> same pointer information added to the tree for the new entry. Hence
> it is safe to use whatever parent pointer attribute for a given
> ino/gen/value we find because it is guaranteed to be valid for the
> operation we are about to perform.
The other race is not getting the EA into the inode before trying to
remove it.
>
> A potential optimisation to improve it is that parent counter does
> not need to be in an EA. It's 32 bits, so can easily be added to the
> spare space in the v2 inode and avoid unnecessary attribute reads
> and writes on parent pointer creation.
>
> Hence the parent pointer structure and implementation in 9) solves
> all the known problems to do with hardlinks, scalability, locking
> order, code complexity etc, and it has relatively little additional
> runtime overhead. It also works for v4 filesystems - the proposed
> patch set cannot be made to work on older filesystem structures as
> there isn't room in the v2 inode for the fast-path fields.
I know you love 2009, but lets talk about it scalability:
1) larger EA will make xfs_rename/xfs_link slower, and limit the link
we can support.
why did they do this = to make an ioctl a bit faster?
still need to do xfs_lookups to make sure the names are still valid.
2) in remove and rename data paths, it iterates over all the EAs
looking for the EA it must remove. That is a non-starter for me.
>
> So, there are several problems with approach 6) that the proposed
> patchset is based on that were later discovered and solved. Mark,
> most of this information was given to you (but in less detail) a
> year ago. The patchset hasn't solved any of the flaws that were
> pointed out back then, and I don't see how they can be with the
> design that is being used.
And the EA problems in the 2009 patch makes that a non-starter for me.
>
> I don't expect anyone to understand all the intricacies of the
> problems that the 2009 parent pointer patchset solved. It's simple
> code, but the problems it solves are subtle and complex and were
> found the hard way - by shipping code for political/commercial
> reasons before the entire problem space was understood.
>
> That's the primary reason why SGI undertook to redesign parent
> pointers for the Linux code base rather than port the Irix code -
> the irix design was fundamentally flawed and a dead end. SGI might
> have lost that historical XFS knowledge and perspective, but the
> Linux XFS community hasn't.
>
> Hence the problems with various parent pointer designs are still
> well understood and, as such, the implementation needs to be done
> properly and solve all the known design flaws we've found in the
> past. To this day, the only parent pointer design that actually
> avoids all the known problems is one we have from SGI from 2009....
>
>> On the other hand keeping the directory and extended attribute in one
>> transaction should keeping the changes atomic when the filesystem
>> is forced down between the directory and attribute changes. Despite
>> all the gore (see below) of doing the directory and attribute changes
>> in one transaction, I think it is the correct thing to do.
>
> Yes, I said that it was necessary a year ago and pointed you at how
> to do it:
>
> http://xfs.org/index.php/Improving_Metadata_Performance_By_Reducing_Journal_Overhead#Atomic_Multi-Transaction_Operations
>
That would limit the log space, but would not help in holding locks over
the directory and EA code. You think the counts solve that problem. It
solves one problem but not all.
The lock order problem in the getparents/getparentpaths is simple.
But in any version, we cannot make sure the getparentpath paths are
correct because of lock ordering at least with the .
> Mark, it saddens me that you've wasted your time on a dead end. I wish
> you had of spent your time implementing AMTs instead, and then just
> forward ported the 2009 patch. If you'd done that a year ago, we'd
> be just about ready to ship a working parent pointer implementation
> instead having to go back to square one. :(
>
> Cheers,
>
> Dave.
You concern is touching.
--Mark.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [RFC 00/17] RFC parent inode pointers.
2014-01-17 21:25 ` Mark Tinguely
@ 2014-01-18 3:12 ` Dave Chinner
2014-01-27 19:41 ` Mark Tinguely
0 siblings, 1 reply; 26+ messages in thread
From: Dave Chinner @ 2014-01-18 3:12 UTC (permalink / raw)
To: Mark Tinguely; +Cc: xfs
On Fri, Jan 17, 2014 at 03:25:38PM -0600, Mark Tinguely wrote:
> On 01/15/14 23:56, Dave Chinner wrote:
> >On Wed, Jan 15, 2014 at 04:00:12PM -0600, Mark Tinguely wrote:
> >>Yeah, yeah, this has gotten buried several times and better get out on
> >>the list for discussion before it gets buried yet again.
> >>
> >>Parent inode support allow XFS to quickly derive a file name and
> >>path from the mount point. This can aid in directory/path policies
> >>and can help relocate items during filesystem shrink.
> >>
> >> 1) Representation of a parent inode entry:
> >>
> >> There are several ways to represent the parent inode entry.
> >> A 2005 XFS meeting came up with the following ideas:
> >>
> >> 1) Storing the parent inode# list inside the inode with a separate field
> >> separate fork
> >
> >Too complex, rejected in the first few minutes of discussion.
> >
> >> 2) Storing the parent inode# list in EA names with null values
> >> EA:<name=inode#, value=NULL>
> >
> >Not uniquely identifying the filename of the inode in parent
> >directory, so can't tell if name that points to inode is current.
> >Hard links are impossible to handle. never implemented.
> >
> >> 3) As in (2) but store the 1st parent inode# in a field in the inode
> >> first-parent-ptr +<name=inode#, value=NULL>
> >
> >Same problem, didn't handle hard links. Also not enough space in the
> >inode core to implement.
> >
> >> 4) As in (2) but store the hardlink names as EA values
> >> EA:<name=inode#, value=fname>
> >
> >Same problem as 2, but added complexity in requiring multiple code
> >paths to keep parent pointers up to date. Never answered the
> >question of "when we remove the parent in the inode, who is the new
> >"primary" parent? Never implemented.
> >
> >> 5) As in (2) but store the EAs on directories as well as leaf files
> >> EAs on directories.
> >
> >Tried to solve problems with 4) by adding more information to
> >directories. Even more complex, had problems with uniquely
> >identifying hard links, really convoluted transaction contexts and
> >locking. Never implemented.
> >
> >> 6) Storing the parent inode# + directory offset in EA names with null values
> >> EA:<name=inode#diroffset, value=NULL>
> >
> >Uniquely identifies the directory entry *slot*, but does not
> >necessarily identify correct filename. Handled hard links if you
> >ignore the parent identity uniqueness issue, but demonstrated
> >unscalable name lookup behaviour.
>
> how is this the wrong file name?
Because you can have a lookup/unlink/link race occur that puts a
different dirent at the directory offset between the EA lookup and
the locking of the parent directory. I suspect a lookup/rename race
could result in the same problem.
> How is a directory xfs_readname() in a ioctl unscalable?
So you do a million reverse lookups. That requires a million ioctl
calls to look up the reverse name attribute, plus a million times
the directory depth of the child calls to read directory entries.
i.e. the operation itself is fine, but it's the fact that it's
required at all that is the scalability problem. i.e. random reverse name
lookups will cause random directory IO to be done, and that does not
scale to doing millions of lookups in a short period of time.
> Adding file names to a EA, that limits scalability. Having to do a
> EA sequential lookup in the remove/rename data paths to find a EA to
> remove, that is unscalable.
Yup, it does that, but that's just a basic, brute force wildcard
search algorithm. I never said the 2009 patchset was perfect, just
that it solves all the known problems and fulfils all the requires
I've ever heard for parent pointer use cases and scalability.
ie. for the majority of use cases, this behaviour is never going to
be noticed as the attributes will be in line or in a single external
leaf and iterated extremely quickly.
Perhaps there's another tweak we can do to the attribute format that
makes this search on removal behaviour go away. Perhaps all we need
is an optimised attribute tree search-and-destroy operation....
> >> The preferred method was #6.
> >
> >As it was, this once looked like the solution. IIRC, this was the
> >design that was implemented for Irix and shipped in 6.5.28 but was
> >never used in production by anyone because of problems that became
> >apparent after it was shipped. It was fundamentally flawed, and
> >those flaws were uncovered when porting the Irix implementation to
> >Linux.
> >
>
> The Irix 6.5.X does not use this method. It uses stripped down
> version of 2009 code. A <inode:generation> combination.
Ah, so it did keep the generation number. My mistake, I haven't been
able to look at that code for more than 5 years, and I don't have
any of the documentation that you have and keep referring to. :/
> >Using the directory offset was discovered to be problematic when
> >there were multiple hardlinks into the same directory for the same
> >inode. You can remove links and add links, and the only thing that
> >changes is the diroffset. Hence to do a safe reverse lookup, you
> >have to first lock the inode to stabilise the EA, then read the
> >parent pointer to get the parent directory, then get the parent
> >inode and lock it so that nobody is modifying it as we traverse it
> >during a offset->name lookup.
>
> um, you are wrong, it is not this version of the code. It uses some
> inode/generation combination.
Sure, but your code doesn't use the inode/generation combination to
identify the parent, and so it has this problem. The fact I
misidentified what Irix implemented doesn't mean the problem I
described does not exist....
> >But this violated the VFS directory tree locking order, which is
> >parent->child. Think about a racing unlink that has locked the
> >parent inode vs a parent pointer lookup that has lock the child and
> >is trying to lock the parent. Once this problem was understood,
> >using directory offsets was considered a non-starter for the linux
> >port given how complex working around the inversion makes the
> >reverse path lookups.
> >
> >I note that Mark's patch set will not trip over this,
> >because the XFS_IOC_GETPARENTS_BY_HANDLE code never actually locks
> >the child inode the handle points to. IOWs, it isn't safe against
> >concurrent modification of the inode, and hence would never deadlock
> >even if the problem was seen. It's more likely to have caused
> >filesystem corruption or crashed.
>
> It has locked in the attribute iteration code.
Only to read the attribute. Then the lock is dropped, and so the
inode and attribute can be modified concurrently. You then use that
parent inode number to look up the parent directory, but there is no
validation that the parent directory obtained is still the same
generation of the inode that the parent pointer attribute pointed
at.
IOWs, there's no locking at the child to stabilise the parent
pointer at parent lookup time, nor is there any checks that the
parent inode that is found is the version of the inode that the
parent pointer attribute was pointing at.
This code *needs* the parent inode generation number because of the
locking model it uses.
> I see your point on
> the locking order and we cannot use a callback to get the name.
> There are very simple solutions for that.
Such as?
> >Further, not storing the filename in the attribute value has
> >scalability limitations. A reverse lookup requires reading the
> >directory entry to get the filename, and so if you have large
> >directories and/or a large number of reverse lookups to perform the
> >cost is huge. Every lookup has to walk the bmbt tree to find the
> >directory data block that contains the offset, then it needs to be
> >read from disk, and then it needs to be iterated to find the correct
> >entry. It's a huge amount of work, and it's unnecessary because we
> >have the name of the file at the time the parent pointer EA is
> >created...
>
> Happens only during the ioctl. The 2009 does far worse things in the
> data path.
Funnily enough, years ago we were always told that the reverse name
lookup was the performance critical path for DMF. i.e. we weren't
allowed to affect ongoing IO operations adversely when doing large
numbers of reverse lookups, and so we optimised for that. Indeed,
doing a single IO per reverse lookup was considered too much
overhead.
As such, you're missing a very important reason for putting the name
in the attribute - DMAPI has a special bulkstat implementation that
extracts in-line attributes along with the inode stat information.
The parent attribute hence could be read via a bulkstat scan, and
hence piggy-backed onto the DMF code that already did periodic scans
of the fileystem.
I have no idea whether DMF ever implemented this, but you should
begin to see why doing per-inode recursive readdir walks to
regenerate parent paths is about the least optimal method that can
be used - it simply can't be optimised into efficient bulk
operations that applications need to handle hundreds of millions of
inodes in a filesystem....
> >The solution that SGI then settled on was this:
> >
> > 9) EA:<name=inode#,generation,count>, value=<filename>
> >
> >It solved all the unique identifier problems, and the lookup
> >scalability issues and all the deadlocks, and that was what was
> >implemented in the patch posted here in 2009:
> >
> > http://thread.gmane.org/gmane.comp.file-systems.xfs.general/27772
> >
> >With the addition of the inode generation number, we can look up the
> >parent inode without holding the child inode locked and know that we
> >got the right inode by verifying the inode/gen tuple match on the
> >inode we looked up. However, we haven't got the child locked, so we
> >need to have some other method of knowing if the child is still in
> >the directory once it is locked. (e.g. unlink/link races).
>
> The gen number changes only if the inode has been reallocated. You
> have a parent inode and a filename, but you would have to do a
> xfs_lookup to make sure that the parent is still the parent.
xfs_lookup() doesn't tell you that the parent is still the parent.
It only tells you that a pathname still exists. It could be a different
inode that this pathname points to. If the directory is removed and
the recreated immediately, it's different directory, but it might
have the same inode number, and the only difference is the
generation number.
Further, races with rename can change the parent directory path
without changing anything else - it doesn't change the child's
parent pointer information at all, and so using lookup to confirm
the path of the parent inode is correct is also racy and not
sufficient to validate the fact you have the correct parent inode.
Again, the only way to identify an inode uniquely when you do not
holds the necesary locks to stabilise the pointer and the pointee
is via the {inode, generation} pair. There is no other way this can
be done.
> >That's where the "count" field of the EA name comes from. Each
> >inode keeps a separate "parent counter" attribute so that hard links
> >to the same directory can be uniquely identified by the EA name. The
> >initial inode create always uses a value of "1", and when the first
> >hardlink is added the special attribute is created with a value of
> >2, and that is used (and incremented) on every subsequent hard link.
> >Hence every hardlink has a unique attribute name/value pair.
>
> This code has never be proven.
Which means what? It was proven to solve the problems that we knew
about at the time....
> It appears to me that this will
> prevent one kind of link/unlink race but to prevent the other, you
> need to hold the lock over the directory code and the EA code.
>
> >As a result, parent lookups (for path resolution or removal) can
> >match on ino/gen/value, knowing that if we race with an unlink and a
> >new hardlink of the *same name* there will be a second *unique*
> >parent pointer attribute (under a ino/gen/cnt+N name) that has the
> >same pointer information added to the tree for the new entry. Hence
> >it is safe to use whatever parent pointer attribute for a given
> >ino/gen/value we find because it is guaranteed to be valid for the
> >operation we are about to perform.
>
> The other race is not getting the EA into the inode before trying to
> remove it.
Can't happen.
I'm not sure that you noticed that the 2009 patchset added
the parent EA in the same transaction that committed the directory
modifications (for create, mkdir and symlink). Hence wasn't in a
separate transaction and so the parent EA was created atomically
with the inode/dirent entry, so the race condition you are worried
about simple does not exist.
Further, even if we ignore the above and assume that the attribute
is added after the create transaction is committed and the inodes
are unlocked, there is no race on the child inode introduced here.
A link can't be unlinked until userspace can see the link,
and that happens when the dentry cache is updated to point at the
new link. That happens after the parent attribute is created, and so
we are guaranteed that there is always a parent EA to remove when
unlink is called.
i.e, all the races are on parent inode operations which invalidate
the child parent pointer. As such, the parent pointer object in the
child needs sufficient information to detect that the parent is
invalid....
> >A potential optimisation to improve it is that parent counter does
> >not need to be in an EA. It's 32 bits, so can easily be added to the
> >spare space in the v2 inode and avoid unnecessary attribute reads
> >and writes on parent pointer creation.
> >
> >Hence the parent pointer structure and implementation in 9) solves
> >all the known problems to do with hardlinks, scalability, locking
> >order, code complexity etc, and it has relatively little additional
> >runtime overhead. It also works for v4 filesystems - the proposed
> >patch set cannot be made to work on older filesystem structures as
> >there isn't room in the v2 inode for the fast-path fields.
>
> I know you love 2009, but lets talk about it scalability:
> 1) larger EA will make xfs_rename/xfs_link slower, and limit the link
> we can support.
Yes, it will be slightly slower, but it doesn't introduce any new
limitations on anything. Indeed, an EA was considered acceptible
per-operation overhead before we had delayed logging - it's no more
overhead than having a default ACL configured. And now all that an
additional EA costs really is CPU overhead. Given that CPU is rarely
the limiting factor for server workloads these days.....
> why did they do this = to make an ioctl a bit faster?
Minimising the IO and cpu overhead of individual lookups was
considered extremely important for cases where large numbers of
random lookups were required - DMF reporting/auditing tools needed
to be able to do this for large number of individual filehandles at
random times and had bound runtime limits. The scale we were given
at the time (remember, this was 2005) was that it had to work well
for filesystems containing hundreds of millions of inodes and having
to do reverse lookups on a significant fraction of those inodes
*every few hours*.
Hence being able to use bulkstat to optimise large-scale reverse
lookups is a killer feature that only the "name in attribute" method
enables....
> still need to do xfs_lookups to make sure the names are still valid.
Not necessarily - DMF listens to all the changes going on, so it can
track which pathname *components* have been moved, unlinked, etc and
so didn't need to revalidate the pathnames of every inode once it
had rebuilt them from filehandles - only those whose pathname
components where known to have changed.
> 2) in remove and rename data paths, it iterates over all the EAs
> looking for the EA it must remove. That is a non-starter for me.
See my comments above about that.
> >> On the other hand keeping the directory and extended attribute in one
> >> transaction should keeping the changes atomic when the filesystem
> >> is forced down between the directory and attribute changes. Despite
> >> all the gore (see below) of doing the directory and attribute changes
> >> in one transaction, I think it is the correct thing to do.
> >
> >Yes, I said that it was necessary a year ago and pointed you at how
> >to do it:
> >
> >http://xfs.org/index.php/Improving_Metadata_Performance_By_Reducing_Journal_Overhead#Atomic_Multi-Transaction_Operations
> >
>
> That would limit the log space, but would not help in holding locks
> over the directory and EA code. You think the counts solve that
> problem. It solves one problem but not all.
The problem you think is there doesn't actually exist in the
2009 patchset because it does hold the inode locks over dirent
and EA creation.
> The lock order problem in the getparents/getparentpaths is simple.
You've said that twice without describing the simple solution. Can
you describe your solution, please?
> But in any version, we cannot make sure the getparentpath paths are
> correct because of lock ordering at least with the .
Right, which is why the path reconstruction shouldn't be in the
kernel - they can be invalid even before we've finishe dthe path
walk, let alone retuned to userspace. With inode/gen being pushed to
userspace with the filename, userspace can reconstruct and validate
the pathname components individually and optimally....
Mark, don't get me wrong - the 2009 patchset is not perfect and it's
not finished and it simply reflects what we knew at the time. When I
refer to that patch, I'm comparing the architecture and design of
the different parent pointer approaches, not the implementation.
The design has to be sound before I care about the implementation
and quality of the code. If we can't agree on basic architecture
and design points, then we are most definitely not going to agree on
the implementation.
Right now, the design of thxp eproposed patchset does not address
the critical problem of identifier uniqueness and ignores the
bulk-lookup performance requirements that we know about. Addressing
those are going to require a change of on-disk attribute format in
that patch set and that invalidates the in-inode-core optimisations
that have been made. IOWs, we need to solve the problem first, then
optimise.
So, what do we need in the parent pointer attribute to solve all the
known problems? The implementation will flow cleanly from what we
can store on disk, and we know that we need at least these things to
solve all the known issues:
* parent inode number and generation (unique identifier)
* link disambiguation (unlink/link race detection)
* filename (for bulk lookup performance)
So the question is how to implement the link disambiguation
efficiently. That is currently implemented in the 2009 patchset with
a the monotonic increasing counter that is appended to the attribute
name. Do we even need a generation count, or is there some other
info we can use that uniquely identifies a dirent?
While the diroffset of a filename is not unique enough to identify
the child, I think the {diroffset,filename,child_inode} tuple is
sufficient. That is, if the diroffset gets reused and points to a
different filename, we can detect that from the contents of EA and
abort. If a link of the same name is created, then we can check
whether it points at the same inode. If it does, then we just don't
care that there was a race because our current pointer is still
valid. And we don't need to store the child inode number in the EA -
we already have that in the child struct xfs_inode structure. That
verification can even be done in userspace.
Hence I think we've already got all the info we need if we make a
hybrid format from the two approaches:
name=parent_inode,gen,diroffset value=filename
The inode/gen gives all the information we need to reliably identify
the parent without requiring child->parent lock ordering, and allows
userspace to do pathname component level reconstruction without the
kernel ever needing to verify the parent itself as part of ioctl
calls.
And finally, by using the diroffset in the EA name, we have a method of
knowing the exact parent pointer EA we need to modify/remove in
rename/unlink without an unbound searching.
I think that solves all the architectural issues that we know
about with both implemenations.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [RFC 00/17] RFC parent inode pointers.
2014-01-18 3:12 ` Dave Chinner
@ 2014-01-27 19:41 ` Mark Tinguely
2014-01-28 3:00 ` Dave Chinner
0 siblings, 1 reply; 26+ messages in thread
From: Mark Tinguely @ 2014-01-27 19:41 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
On 01/17/14 21:12, Dave Chinner wrote:
<massive delete we can go over it point by point if necessary, but let
us start here>.
1) Yep, the parent inode generation number is needed. I thought I said
it was, bad on me if I did not. It was an RFC and I was too lazy to
go back and add it in.
2) Add the filename to EA. Not a fan, but I will ask but if DMF needs it
for performance then it has to be done. My point was this assumes
that we can keep all the links' EA entries inline in the inode. A
couple 255 character files or several links of modest sized filenames
would negate that assumption. I tried to minimize the EA entries to
keep them inline in the inode. I will talk to the DMF group.
3) There is a unlink/link race because the directory and EA changes
are done without a common lock. I hit this in testing.
Assume the sequence was something like *:
ln a filename1 (EA saved to inode)
rm filename1
ln a filename1 (EA not saved because it is a duplicate)
(rm EA operation happens and removes the only PIP
entry)
....
rm filename1 (no EA entry error)
* My speculation from counters and my testing.
i) Why not add the lock to keep the directory/EA changes in sync?
ii) 2009 code required duplicate EA entries to compensate.
A) Required a counter/inode to make every link unique.
Granted this counter could be in a inode field.
B) Required a EA walk to find one of the duplicates entries
for the remove.
i) Mark no likey, much bitching and moaning...
C) More below.
> Mark, don't get me wrong - the 2009 patchset is not perfect and it's
> not finished and it simply reflects what we knew at the time. When I
> refer to that patch, I'm comparing the architecture and design of
> the different parent pointer approaches, not the implementation.
> The design has to be sound before I care about the implementation
> and quality of the code. If we can't agree on basic architecture
> and design points, then we are most definitely not going to agree on
> the implementation.
>
> Right now, the design of the proposed patchset does not address
> the critical problem of identifier uniqueness and ignores the
> bulk-lookup performance requirements that we know about. Addressing
> those are going to require a change of on-disk attribute format in
> that patch set and that invalidates the in-inode-core optimisations
> that have been made. IOWs, we need to solve the problem first, then
> optimise.
>
> So, what do we need in the parent pointer attribute to solve all the
> known problems? The implementation will flow cleanly from what we
> can store on disk, and we know that we need at least these things to
> solve all the known issues:
>
> * parent inode number and generation (unique identifier)
agreed
> * link disambiguation (unlink/link race detection)
why allow a unlink/link race?
> * filename (for bulk lookup performance)
>
> So the question is how to implement the link disambiguation
> efficiently. That is currently implemented in the 2009 patchset with
> a the monotonic increasing counter that is appended to the attribute
> name. Do we even need a generation count, or is there some other
> info we can use that uniquely identifies a dirent?
>
> While the diroffset of a filename is not unique enough to identify
> the child, I think the {diroffset,filename,child_inode} tuple is
> sufficient. That is, if the diroffset gets reused and points to a
> different filename, we can detect that from the contents of EA and
> abort. If a link of the same name is created, then we can check
> whether it points at the same inode. If it does, then we just don't
> care that there was a race because our current pointer is still
> valid. And we don't need to store the child inode number in the EA -
> we already have that in the child struct xfs_inode structure. That
> verification can even be done in userspace.
>
> Hence I think we've already got all the info we need if we make a
> hybrid format from the two approaches:
>
> name=parent_inode,gen,diroffset value=filename
>
> The inode/gen gives all the information we need to reliably identify
> the parent without requiring child->parent lock ordering, and allows
> userspace to do pathname component level reconstruction without the
> kernel ever needing to verify the parent itself as part of ioctl
> calls.
>
> And finally, by using the diroffset in the EA name, we have a method of
> knowing the exact parent pointer EA we need to modify/remove in
> rename/unlink without an unbound searching.
>
> I think that solves all the architectural issues that we know
> about with both implemenations.
>
> Cheers,
>
> Dave.
Thinking out loud:
EA names have to be unique.
A link/unlink/link EA sequence would have to do a EA RENAME (overwrite
the duplicate EA with new name).
Have to do either:
Do a EA lookup and compare before remove.
or
Add a new EA command that removes a name/value pair.
Not sure if this would work on more than one unlink/link race and seems
like this would still not work if filename of the 2 links are
the same.
Leaving a known race makes me a bit queezy. My internal version uses
locks, but I were clear that you did not like the locks and so they were
not included in the RFC.
---- small hypothetical digression ----
If we could use the inode fields for a PIP entry (no filename
in the EA requirement), Olaf Weber came up with a clever PIP entry
EA swizzle that would leave all the PIP inserts/deletes to be done to
the incore inode fields at the same time as the directory operation.
It requires the offset(s) be looked up before the directory
insert/deletes. Pretty much academic if we cannot use in the inode
fields.
--Mark.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [RFC 00/17] RFC parent inode pointers.
2014-01-27 19:41 ` Mark Tinguely
@ 2014-01-28 3:00 ` Dave Chinner
2014-01-28 22:02 ` Geoffrey Wehrman
0 siblings, 1 reply; 26+ messages in thread
From: Dave Chinner @ 2014-01-28 3:00 UTC (permalink / raw)
To: Mark Tinguely; +Cc: xfs
On Mon, Jan 27, 2014 at 01:41:47PM -0600, Mark Tinguely wrote:
> On 01/17/14 21:12, Dave Chinner wrote:
>
> <massive delete we can go over it point by point if necessary, but let
> us start here>.
>
> 1) Yep, the parent inode generation number is needed. I thought I said
> it was, bad on me if I did not. It was an RFC and I was too lazy to
> go back and add it in.
>
> 2) Add the filename to EA. Not a fan, but I will ask but if DMF needs it
> for performance then it has to be done. My point was this assumes
> that we can keep all the links' EA entries inline in the inode. A
> couple 255 character files or several links of modest sized filenames
> would negate that assumption. I tried to minimize the EA entries to
> keep them inline in the inode. I will talk to the DMF group.
Actually, I made the point about DMF needing them inline performance
because that's an argument SGI might find persuasive. What I didn't
say just then is that *I* need them inline, too, as online directory
tree scrubbing needs to be able to do bulks scans, as does
xfs_repair. However, I have idefinitely said this before in previous
parent poitner discussions, so it should be no surprise here...
> 3) There is a unlink/link race because the directory and EA changes
> are done without a common lock. I hit this in testing.
Because your patch set didn't contain the necessary link
discrimination information in the parent EA and so serialisation was
required?
> ln a filename1 (EA saved to inode)
> rm filename1
> ln a filename1 (EA not saved because it is a duplicate)
Why wasn't the parent EA removed in the 'rm filename1' syscall
context? You can't do a link operation while an unlink is in
progress because of the VFS directory mutex serialising namespace
operations. The only way I can see this happening is if you are
bypassing the VFS. i.e. this is a CXFS problem, yes?
> (rm EA operation happens and removes the only PIP
> entry)
> rm filename1 (no EA entry error)
>
> * My speculation from counters and my testing.
>
> i) Why not add the lock to keep the directory/EA changes in sync?
We don't need to add a lock - we already have one in the parent
inode lock. We can tell xfs_trans_commit() not to unlock objects
during the commit. i.e. using xfs_trans_ijoin(... 0) rather than
xfs_trans_ijoin(... XFS_ILOCK_EXCL) ensures an inode is not unlocked
during a commit and so we can do multiple rolling transactions on
that inode without unlocking it.
This is used throughout the attribute code so that the multiple
transactions needed to modify attributes are done atomically, same
as we do in xfs_inactive(), xfs_itruncate_extents(), etc. The same
needs to be done for the parent EA code, regardless of how we do the
multiple transactions.
> ii) 2009 code required duplicate EA entries to compensate.
> A) Required a counter/inode to make every link unique.
> Granted this counter could be in a inode field.
> B) Required a EA walk to find one of the duplicates entries
> for the remove.
> i) Mark no likey, much bitching and moaning...
> C) More below.
Yes, we know you don't like the implementation it used to provide
link discrimination. I've already proposed a way to fix the issues
you have with it, so let's move on....
> >Mark, don't get me wrong - the 2009 patchset is not perfect and it's
> >not finished and it simply reflects what we knew at the time. When I
> >refer to that patch, I'm comparing the architecture and design of
> >the different parent pointer approaches, not the implementation.
> >The design has to be sound before I care about the implementation
> >and quality of the code. If we can't agree on basic architecture
> >and design points, then we are most definitely not going to agree on
> >the implementation.
> >
> >Right now, the design of the proposed patchset does not address
> >the critical problem of identifier uniqueness and ignores the
> >bulk-lookup performance requirements that we know about. Addressing
> >those are going to require a change of on-disk attribute format in
> >that patch set and that invalidates the in-inode-core optimisations
> >that have been made. IOWs, we need to solve the problem first, then
> >optimise.
> >
> >So, what do we need in the parent pointer attribute to solve all the
> >known problems? The implementation will flow cleanly from what we
> >can store on disk, and we know that we need at least these things to
> >solve all the known issues:
> >
> > * parent inode number and generation (unique identifier)
>
> agreed
>
> > * link disambiguation (unlink/link race detection)
>
> why allow a unlink/link race?
I mentioned it as an example in the context of the discussion. Even
if we can avoid the link/unlink races, we still need link
disambiguation - we have to be able to do this to be able to use
parent pointers for validating and repairing the directory
structure.
> > * filename (for bulk lookup performance)
The filename is needed for directory structure validation and
repair, too.
Indeed, this is the primary reason I want parent pointers - for
xfs_repair and online filesystem scrubbing. If you don't have the
parent pointer giving all the information necessary to validate that
reverse and forward links match precisely, then you cannot use them
for validation and repair purposes.
For example, something went bad, and we then end up
with an inode with two parent pointers that point to the same
directory. The parent directory has one pointer to the inode, so
there's a dangling link on the child inode. Now, how do you tell if
we should connect the link back up to the parent, or whether we
should remove the EA from the child?
If we don't have the filename in the child's EA, then we can't
recover missing the parent dir entry at all - there's no redundant
information to work from. Hence we can only trash one of the child's
EAs and update it's link count. If the child has the filename and
the dir offset, then we can check if the dir_offset in the parent is
empty, and if it is we can reconnect the directory to the child.
Otherwise we trash the child's EA.
The killer repair feature, however, is that we can rebuild the
entire directory structure if inode parent EAs have filenames in
them. With the name, unique parent identifier and directory offset,
we can find all the unique directory entries that made up the
missing parent directory and reconstruct it. IOWs, we can recover
the entire filesystem from a lost root inode without leaving large
amounts of work for users to reconstruct the directory structure
from lost_found. IOWs, we can mostly remove the need for lost+found
with correctly structured parent pointers...
> >So the question is how to implement the link disambiguation
> >efficiently. That is currently implemented in the 2009 patchset with
> >a the monotonic increasing counter that is appended to the attribute
> >name. Do we even need a generation count, or is there some other
> >info we can use that uniquely identifies a dirent?
> >
> >While the diroffset of a filename is not unique enough to identify
> >the child, I think the {diroffset,filename,child_inode} tuple is
> >sufficient. That is, if the diroffset gets reused and points to a
> >different filename, we can detect that from the contents of EA and
> >abort. If a link of the same name is created, then we can check
> >whether it points at the same inode. If it does, then we just don't
> >care that there was a race because our current pointer is still
> >valid. And we don't need to store the child inode number in the EA -
> >we already have that in the child struct xfs_inode structure. That
> >verification can even be done in userspace.
> >
> >Hence I think we've already got all the info we need if we make a
> >hybrid format from the two approaches:
> >
> > name=parent_inode,gen,diroffset value=filename
> >
> >The inode/gen gives all the information we need to reliably identify
> >the parent without requiring child->parent lock ordering, and allows
> >userspace to do pathname component level reconstruction without the
> >kernel ever needing to verify the parent itself as part of ioctl
> >calls.
> >
> >And finally, by using the diroffset in the EA name, we have a method of
> >knowing the exact parent pointer EA we need to modify/remove in
> >rename/unlink without an unbound searching.
> >
> >I think that solves all the architectural issues that we know
> >about with both implemenations.
>
> Thinking out loud:
>
> EA names have to be unique.
Yes, but they need to be unique in a way that we can calculate from
the context of operations on the child inode so we can avoid
searching the attr tree for a matching name/value pair.
> A link/unlink/link EA sequence would have to do a EA RENAME (overwrite
> the duplicate EA with new name).
I'm afraid I don't follow you here - if EA names are unique, then
what exactly are we renaming? And if parent EA operations are atomic
with the directory name operations, then how can we ever get a
problem w.r.t. duplicate names as the directory operation will
prevent that from happening?
> Have to do either:
> Do a EA lookup and compare before remove.
I assume you mean "compare value before remove", because an EA
remove already does a EA lookup first to find the attribute location
in the attr tree before doing a remove. If the attr name is not
found, the ENOATTR is returned.
> or
> Add a new EA command that removes a name/value pair.
Which is trivial to add to the existing xfs_attr_remove() codepath.
I'm not sure it solves anything, though, if we use link
disambiguation to provide unique names and serialise EAs to
create/link/unlink/rename correctly.
> Not sure if this would work on more than one unlink/link race and seems
> like this would still not work if filename of the 2 links are
> the same.
>
> Leaving a known race makes me a bit queezy.
I've never proposed that we leave a known race *unhandled*. Races
don't have to be handled by locks - locks are the easiest way to
handle them, but not necessarily the best way. There are plenty of
race conditions in the code that are handled at the sites where the
race condition can be *reliably detected*.
> My internal version uses
> locks, but I were clear that you did not like the locks and so they were
> not included in the RFC.
You should *never* make assumptions about what any reviewer might
say about code that they've never seen. I expect developers to post
code that reflects the current state of their knowledge of a
problem. That's especially true for proposals and RFCs where holding
back information is frequently harmful to the discussion, and makes
people disinclined to engage with you in future.
Please don't make that mistake again.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [RFC 00/17] RFC parent inode pointers.
2014-01-28 3:00 ` Dave Chinner
@ 2014-01-28 22:02 ` Geoffrey Wehrman
2014-02-04 0:09 ` Dave Chinner
0 siblings, 1 reply; 26+ messages in thread
From: Geoffrey Wehrman @ 2014-01-28 22:02 UTC (permalink / raw)
To: Dave Chinner; +Cc: Mark Tinguely, xfs
On Tue, Jan 28, 2014 at 02:00:40PM +1100, Dave Chinner wrote:
| On Mon, Jan 27, 2014 at 01:41:47PM -0600, Mark Tinguely wrote:
| > 2) Add the filename to EA. Not a fan, but I will ask but if DMF needs it
| > for performance then it has to be done. My point was this assumes
| > that we can keep all the links' EA entries inline in the inode. A
| > couple 255 character files or several links of modest sized filenames
| > would negate that assumption. I tried to minimize the EA entries to
| > keep them inline in the inode. I will talk to the DMF group.
|
| Actually, I made the point about DMF needing them inline performance
| because that's an argument SGI might find persuasive. What I didn't
| say just then is that *I* need them inline, too, as online directory
| tree scrubbing needs to be able to do bulks scans, as does
| xfs_repair. However, I have idefinitely said this before in previous
| parent poitner discussions, so it should be no surprise here...
I appologize in advance for my ignorance. What is "online directory
tree scrubbing" and how does it benefit from parent inode pointers?
--
Geoffrey Wehrman
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [RFC 00/17] RFC parent inode pointers.
2014-01-28 22:02 ` Geoffrey Wehrman
@ 2014-02-04 0:09 ` Dave Chinner
2014-02-04 5:37 ` Geoffrey Wehrman
0 siblings, 1 reply; 26+ messages in thread
From: Dave Chinner @ 2014-02-04 0:09 UTC (permalink / raw)
To: Geoffrey Wehrman; +Cc: Mark Tinguely, xfs
On Tue, Jan 28, 2014 at 04:02:30PM -0600, Geoffrey Wehrman wrote:
> On Tue, Jan 28, 2014 at 02:00:40PM +1100, Dave Chinner wrote:
> | On Mon, Jan 27, 2014 at 01:41:47PM -0600, Mark Tinguely wrote:
> | > 2) Add the filename to EA. Not a fan, but I will ask but if DMF needs it
> | > for performance then it has to be done. My point was this assumes
> | > that we can keep all the links' EA entries inline in the inode. A
> | > couple 255 character files or several links of modest sized filenames
> | > would negate that assumption. I tried to minimize the EA entries to
> | > keep them inline in the inode. I will talk to the DMF group.
> |
> | Actually, I made the point about DMF needing them inline performance
> | because that's an argument SGI might find persuasive. What I didn't
> | say just then is that *I* need them inline, too, as online directory
> | tree scrubbing needs to be able to do bulks scans, as does
> | xfs_repair. However, I have idefinitely said this before in previous
> | parent poitner discussions, so it should be no surprise here...
>
> I appologize in advance for my ignorance. What is "online directory
> tree scrubbing" and how does it benefit from parent inode pointers?
It will be a kernel thread that walks the directory tree
periodically verifying that all inodes are reachable and that the
directory tree is intact. It's the same principle as RAID array
media scrubbing - proactive detection of errors that could cause
failures and repairing them before an error is delivered to the
application. The reason for wanting parent pointers is that it
enables immediate repair of most corruptions as the scrubbing
detects them without requiring duplicate copies of the metadata.
IOWs, parent pointers are a fundamental part of bringing on-line
repair functionality to XFS. While parent pointers are not directly
mentioned in this document:
http://xfs.org/index.php/Reliable_Detection_and_Repair_of_Metadata_Corruption
There is a section on reverse mapping for used blocks and how to use
that for online repair of detected damage. Parent pointers provide
reverse mappings for the directory structure, and hence provide the
same functionality. Parent pointers also means that if we get an IO
error in a block in a file, we can identify the owner of the block
by full path and offset into the file where the error has
occurred...
The v5 metadata format has an owner field in all metadata that has a
single parent object for keeping such parent information, and it is
intended for improving scrub-based validation and for identifying
the owner of lost metadata blocks to improve recover from errors.
i.e. reverse mapping is a one of the fundamental architectural
requirements underlying the v5 metadata changes.
The two missing pieces are the parent pointers for directory
structure reverse mapping, and an AGF reverse mapping btree to
enable arbitrary block->owner lookups. I have prototype code for the
AGF rmap btree, and the parent pointers provide that functionality
for the directory structure. With both of these implemented, we will
be able to do fully-online filesystem metadata validity checking
equivalent to 'xfs_repair -n' or better.....
Reverse block mapping and directory structure parent pointers have
been considered necessary for robust exception handling and online
repair since we first started thinking about the CRC metadata format
changes....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [RFC 00/17] RFC parent inode pointers.
2014-02-04 0:09 ` Dave Chinner
@ 2014-02-04 5:37 ` Geoffrey Wehrman
0 siblings, 0 replies; 26+ messages in thread
From: Geoffrey Wehrman @ 2014-02-04 5:37 UTC (permalink / raw)
To: Dave Chinner; +Cc: Mark Tinguely, xfs
On Tue, Feb 04, 2014 at 11:09:44AM +1100, Dave Chinner wrote:
| On Tue, Jan 28, 2014 at 04:02:30PM -0600, Geoffrey Wehrman wrote:
| > On Tue, Jan 28, 2014 at 02:00:40PM +1100, Dave Chinner wrote:
| > | On Mon, Jan 27, 2014 at 01:41:47PM -0600, Mark Tinguely wrote:
| > | > 2) Add the filename to EA. Not a fan, but I will ask but if DMF needs it
| > | > for performance then it has to be done. My point was this assumes
| > | > that we can keep all the links' EA entries inline in the inode. A
| > | > couple 255 character files or several links of modest sized filenames
| > | > would negate that assumption. I tried to minimize the EA entries to
| > | > keep them inline in the inode. I will talk to the DMF group.
| > |
| > | Actually, I made the point about DMF needing them inline performance
| > | because that's an argument SGI might find persuasive. What I didn't
| > | say just then is that *I* need them inline, too, as online directory
| > | tree scrubbing needs to be able to do bulks scans, as does
| > | xfs_repair. However, I have idefinitely said this before in previous
| > | parent poitner discussions, so it should be no surprise here...
| >
| > I appologize in advance for my ignorance. What is "online directory
| > tree scrubbing" and how does it benefit from parent inode pointers?
|
| It will be a kernel thread that walks the directory tree
| periodically verifying that all inodes are reachable and that the
| directory tree is intact. It's the same principle as RAID array
| media scrubbing - proactive detection of errors that could cause
| failures and repairing them before an error is delivered to the
| application. The reason for wanting parent pointers is that it
| enables immediate repair of most corruptions as the scrubbing
| detects them without requiring duplicate copies of the metadata.
|
| IOWs, parent pointers are a fundamental part of bringing on-line
| repair functionality to XFS. While parent pointers are not directly
| mentioned in this document:
|
| http://xfs.org/index.php/Reliable_Detection_and_Repair_of_Metadata_Corruption
|
| There is a section on reverse mapping for used blocks and how to use
| that for online repair of detected damage. Parent pointers provide
| reverse mappings for the directory structure, and hence provide the
| same functionality. Parent pointers also means that if we get an IO
| error in a block in a file, we can identify the owner of the block
| by full path and offset into the file where the error has
| occurred...
|
| The v5 metadata format has an owner field in all metadata that has a
| single parent object for keeping such parent information, and it is
| intended for improving scrub-based validation and for identifying
| the owner of lost metadata blocks to improve recover from errors.
| i.e. reverse mapping is a one of the fundamental architectural
| requirements underlying the v5 metadata changes.
|
| The two missing pieces are the parent pointers for directory
| structure reverse mapping, and an AGF reverse mapping btree to
| enable arbitrary block->owner lookups. I have prototype code for the
| AGF rmap btree, and the parent pointers provide that functionality
| for the directory structure. With both of these implemented, we will
| be able to do fully-online filesystem metadata validity checking
| equivalent to 'xfs_repair -n' or better.....
|
| Reverse block mapping and directory structure parent pointers have
| been considered necessary for robust exception handling and online
| repair since we first started thinking about the CRC metadata format
| changes....
I understand now. Thanks for the detailed explanation!
--
Geoffrey Wehrman 651-683-5496 gwehrman@sgi.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 26+ messages in thread
end of thread, other threads:[~2014-02-04 5:38 UTC | newest]
Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-01-15 22:00 [RFC 00/17] RFC parent inode pointers Mark Tinguely
2014-01-15 22:00 ` [RFC 01/17] xfs: (parent ptr) get offset when adding directory name Mark Tinguely
2014-01-15 22:00 ` [RFC 02/17] xfs: (parent ptr) get offset when removing " Mark Tinguely
2014-01-15 22:00 ` [RFC 03/17] xfs: (parent ptr) get offset when replacing a " Mark Tinguely
2014-01-15 22:00 ` [RFC 04/17] xfs: (parent ptr) add parent pointer support to xfs_sb.h Mark Tinguely
2014-01-15 22:00 ` [RFC 05/17] xfs: (parent ptr) add parent pointer support to attribute code Mark Tinguely
2014-01-15 22:00 ` [RFC 06/17] xfs: (parent ptr) add parent pointer support to inode v5 Mark Tinguely
2014-01-15 22:00 ` [RFC 07/17] xfs: (parent ptr) add parent pointer support to xfs_create Mark Tinguely
2014-01-15 22:00 ` [RFC 08/17] xfs: (parent ptr) add parent pointer support to xfs_symlink Mark Tinguely
2014-01-15 22:00 ` [RFC 09/17] xfs: (parent ptr) add parent pointer support to xfs_link Mark Tinguely
2014-01-15 22:00 ` [RFC 10/17] xfs: (parent ptr) add parent pointer support to xfs_remove Mark Tinguely
2014-01-15 22:00 ` [RFC 11/17] xfs: (parent ptr) add parent pointer support to xfs_rename Mark Tinguely
2014-01-15 22:00 ` [RFC 12/17] xfs: (parent ptr) add parent pointer support for user space Mark Tinguely
2014-01-15 22:00 ` [RFC 13/17] xfsprogs: add parent pointer support into Linux 3.10 inode 3 Mark Tinguely
2014-01-15 22:00 ` [RFC 14/17] xfsprogs: add parent pointer values to headers and fix repair Mark Tinguely
2014-01-15 22:00 ` [RFC 15/17] xfsprogs: add basic parent pointer support to xfs_db Mark Tinguely
2014-01-15 22:00 ` [RFC 16/17] xfsprogs: add parent pointer support to xfs_io Mark Tinguely
2014-01-15 22:00 ` [RFC 17/17] xfsprogs: add parent GEOM information Mark Tinguely
2014-01-16 5:56 ` [RFC 00/17] RFC parent inode pointers Dave Chinner
2014-01-17 21:25 ` Mark Tinguely
2014-01-18 3:12 ` Dave Chinner
2014-01-27 19:41 ` Mark Tinguely
2014-01-28 3:00 ` Dave Chinner
2014-01-28 22:02 ` Geoffrey Wehrman
2014-02-04 0:09 ` Dave Chinner
2014-02-04 5:37 ` Geoffrey Wehrman
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).