* [patch 2/2] i_version update - ext4 part
@ 2007-05-25 16:25 Jean noel Cordenner
2007-05-29 19:44 ` Mingming Cao
0 siblings, 1 reply; 5+ messages in thread
From: Jean noel Cordenner @ 2007-05-25 16:25 UTC (permalink / raw)
To: linux-ext4, nfsv4, linux-fsdevel
[-- Attachment #1: Type: text/plain, Size: 471 bytes --]
The patch is on top of the ext4 tree:
http://repo.or.cz/w/ext4-patch-queue.git
In this part, the i_version counter is stored into 2 32bit fields of
the ext4_inode structure osd1.linux1.l_i_version and i_version_hi.
I included the ext4_expand_inode_extra_isize patch, which does part of
the job, checking if there is enough room for extra fields in the inode
(i_version_hi). The other patch increments the counter on inode
modifications and set it on inode creation.
[-- Attachment #2: ext4_expand_inode_extra_isize.patch --]
[-- Type: text/x-patch, Size: 15033 bytes --]
This patch is on top of the nanosecond timestamp and i_version_hi
patches.
This patch adds 64-bit inode version support to ext4. The lower 32 bits
are stored in the osd1.linux1.l_i_version field while the high 32 bits
are stored in the i_version_hi field newly created in the ext4_inode.
We need to make sure that existing filesystems can also avail the new
fields that have been added to the inode. We use s_want_extra_isize and
s_min_extra_isize to decide by how much we should expand the inode. If
EXT4_FEATURE_RO_COMPAT_EXTRA_ISIZE feature is set then we expand by
max(s_want_extra_isize, s_min_extra_isize , sizeof(ext4_inode) -
EXT4_GOOD_OLD_INODE_SIZE) bytes. Actually it is still an open question
about whether users should be able to set s_*_extra_isize smaller than
the known fields or not.
This patch also adds the functionality to expand inodes to include the
newly added fields. We start by trying to expand by s_want_extra_isize
bytes and if its fails we try to expand by s_min_extra_isize bytes. This
is done by changing the i_extra_isize if enough space is available in
the inode and no EAs are present. If EAs are present and there is enough
space in the inode then the EAs in the inode are shifted to make space.
If enough space is not available in the inode due to the EAs then 1 or
more EAs are shifted to the external EA block. In the worst case when
even the external EA block does not have enough space we inform the user
that some EA would need to be deleted or s_min_extra_isize would have to
be reduced.
This would be online expansion of inodes. I am also working on adding an
"expand_inodes" option to e2fsck which will expand all the used inodes.
Signed-off-by: Andreas Dilger <adilger@clusterfs.com>
Signed-off-by: Kalpak Shah <kalpak@clusterfs.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Index: linux-2.6.22-rc2-ext4-1/fs/ext4/inode.c
===================================================================
--- linux-2.6.22-rc2-ext4-1.orig/fs/ext4/inode.c 2007-05-25 17:12:37.000000000 +0200
+++ linux-2.6.22-rc2-ext4-1/fs/ext4/inode.c 2007-05-25 17:12:41.000000000 +0200
@@ -2709,6 +2709,13 @@
EXT4_INODE_GET_XTIME(i_atime, inode, raw_inode);
EXT4_EINODE_GET_XTIME(i_crtime, ei, raw_inode);
+ ei->i_fs_version = le32_to_cpu(raw_inode->i_disk_version);
+ if (EXT4_INODE_SIZE(inode->i_sb) > EXT4_GOOD_OLD_INODE_SIZE) {
+ if (EXT4_FITS_IN_INODE(raw_inode, ei, i_version_hi))
+ ei->i_fs_version |=
+ (__u64)(le32_to_cpu(raw_inode->i_version_hi)) << 32;
+ }
+
if (S_ISREG(inode->i_mode)) {
inode->i_op = &ext4_file_inode_operations;
inode->i_fop = &ext4_file_operations;
@@ -2852,8 +2859,14 @@
} else for (block = 0; block < EXT4_N_BLOCKS; block++)
raw_inode->i_block[block] = ei->i_data[block];
- if (ei->i_extra_isize)
+ raw_inode->i_disk_version = cpu_to_le32(ei->i_fs_version);
+ if (ei->i_extra_isize) {
+ if (EXT4_FITS_IN_INODE(raw_inode, ei, i_version_hi)) {
+ raw_inode->i_version_hi =
+ cpu_to_le32(ei->i_fs_version >> 32);
+ }
raw_inode->i_extra_isize = cpu_to_le16(ei->i_extra_isize);
+ }
BUFFER_TRACE(bh, "call ext4_journal_dirty_metadata");
rc = ext4_journal_dirty_metadata(handle, bh);
@@ -3127,10 +3140,32 @@
int ext4_mark_inode_dirty(handle_t *handle, struct inode *inode)
{
struct ext4_iloc iloc;
- int err;
+ int err, ret;
+ static int expand_message;
might_sleep();
err = ext4_reserve_inode_write(handle, inode, &iloc);
+ if (EXT4_I(inode)->i_extra_isize <
+ EXT4_SB(inode->i_sb)->s_want_extra_isize &&
+ !(EXT4_I(inode)->i_state & EXT4_STATE_NO_EXPAND)) {
+ /* We need extra buffer credits since we may write into EA block
+ * with this same handle */
+ if ((jbd2_journal_extend(handle,
+ EXT4_DATA_TRANS_BLOCKS(inode->i_sb))) == 0) {
+ ret = ext4_expand_extra_isize(inode,
+ EXT4_SB(inode->i_sb)->s_want_extra_isize,
+ iloc, handle);
+ if (ret) {
+ EXT4_I(inode)->i_state |= EXT4_STATE_NO_EXPAND;
+ if (!expand_message) {
+ ext4_warning(inode->i_sb, __FUNCTION__,
+ "Unable to expand inode %lu. Delete some"
+ " EAs or run e2fsck.", inode->i_ino);
+ expand_message = 1;
+ }
+ }
+ }
+ }
if (!err)
err = ext4_mark_iloc_dirty(handle, inode, &iloc);
return err;
Index: linux-2.6.22-rc2-ext4-1/include/linux/ext4_fs.h
===================================================================
--- linux-2.6.22-rc2-ext4-1.orig/include/linux/ext4_fs.h 2007-05-25 17:12:37.000000000 +0200
+++ linux-2.6.22-rc2-ext4-1/include/linux/ext4_fs.h 2007-05-25 17:12:41.000000000 +0200
@@ -202,6 +202,7 @@
#define EXT4_STATE_JDATA 0x00000001 /* journaled data exists */
#define EXT4_STATE_NEW 0x00000002 /* inode is newly created */
#define EXT4_STATE_XATTR 0x00000004 /* has in-inode xattrs */
+#define EXT4_STATE_NO_EXPAND 0x00000008 /* No space for expansion */
/* Used to pass group descriptor data when online resize is done */
struct ext4_new_group_input {
@@ -297,7 +298,7 @@
__le32 i_flags; /* File flags */
union {
struct {
- __u32 l_i_reserved1;
+ __u32 l_i_version;
} linux1;
struct {
__u32 h_i_translator;
@@ -405,6 +406,8 @@
raw_inode->xtime ## _extra); \
} while (0)
+#define i_disk_version osd1.linux1.l_i_version
+
#if defined(__KERNEL__) || defined(__linux__)
#define i_reserved1 osd1.linux1.l_i_reserved1
#define i_frag osd2.linux2.l_i_frag
Index: linux-2.6.22-rc2-ext4-1/include/linux/ext4_fs_i.h
===================================================================
--- linux-2.6.22-rc2-ext4-1.orig/include/linux/ext4_fs_i.h 2007-05-25 17:12:37.000000000 +0200
+++ linux-2.6.22-rc2-ext4-1/include/linux/ext4_fs_i.h 2007-05-25 17:12:41.000000000 +0200
@@ -154,6 +154,7 @@
unsigned long i_ext_generation;
struct ext4_ext_cache i_cached_extent;
struct timespec i_crtime;
+ __u64 i_fs_version;
};
#endif /* _LINUX_EXT4_FS_I */
Index: linux-2.6.22-rc2-ext4-1/fs/ext4/xattr.c
===================================================================
--- linux-2.6.22-rc2-ext4-1.orig/fs/ext4/xattr.c 2007-05-25 17:12:37.000000000 +0200
+++ linux-2.6.22-rc2-ext4-1/fs/ext4/xattr.c 2007-05-25 17:12:41.000000000 +0200
@@ -508,6 +508,20 @@
return;
}
+static inline size_t ext3_xattr_free_space(struct ext4_xattr_entry *last,
+ size_t *min_offs, void *base, int *total)
+{
+ for (; !IS_LAST_ENTRY(last); last = EXT4_XATTR_NEXT(last)) {
+ *total += EXT4_XATTR_LEN(last->e_name_len);
+ if (!last->e_value_block && last->e_value_size) {
+ size_t offs = le16_to_cpu(last->e_value_offs);
+ if (offs < *min_offs)
+ *min_offs = offs;
+ }
+ }
+ return (*min_offs - ((void *)last - base) - sizeof(__u32));
+}
+
struct ext4_xattr_info {
int name_index;
const char *name;
@@ -606,6 +620,7 @@
memmove(s->here, (void *)s->here + size,
(void *)last - (void *)s->here + sizeof(__u32));
memset(last, 0, size);
+
}
}
@@ -1014,6 +1029,8 @@
if (!error) {
ext4_xattr_update_super_block(handle, inode->i_sb);
inode->i_ctime = ext4_current_time(inode);
+ if(!value)
+ EXT4_I(inode)->i_state &= ~EXT4_STATE_NO_EXPAND;
error = ext4_mark_iloc_dirty(handle, inode, &is.iloc);
/*
* The bh is consumed by ext4_mark_iloc_dirty, even with
@@ -1066,6 +1083,251 @@
return error;
}
+static void ext3_xattr_shift_entries(struct ext4_xattr_entry *entry,
+ int value_offs_shift, void *to,
+ void *from, size_t n, int blocksize)
+{
+ struct ext4_xattr_entry *last = entry;
+ int new_offs;
+
+ /* Adjust the value offsets of the entries */
+ for (; !IS_LAST_ENTRY(last); last = EXT4_XATTR_NEXT(last)) {
+ if (!last->e_value_block && last->e_value_size) {
+ new_offs = le16_to_cpu(last->e_value_offs) +
+ value_offs_shift;
+ BUG_ON(new_offs + le32_to_cpu(last->e_value_size) > blocksize);
+ last->e_value_offs = cpu_to_le16(new_offs);
+ }
+ }
+ /* Shift the entries by n bytes */
+ memmove(to, from, n);
+}
+
+/* Expand an inode by new_extra_isize bytes.
+ * Returns 0 on success or negative error number on failure.
+ */
+int ext4_expand_extra_isize(struct inode *inode, int new_extra_isize,
+ struct ext4_iloc iloc, handle_t *handle)
+{
+ struct ext4_inode *raw_inode;
+ struct ext4_xattr_ibody_header *header;
+ struct ext4_xattr_entry *entry, *last, *first;
+ struct buffer_head *bh = NULL;
+ struct ext4_xattr_ibody_find *is = NULL;
+ struct ext4_xattr_block_find *bs = NULL;
+ char *buffer = NULL, *b_entry_name = NULL;
+ size_t min_offs, free;
+ int total_ino, total_blk;
+ void *base, *start, *end;
+ int extra_isize = 0, error = 0, tried_min_extra_isize = 0;
+ int s_min_extra_isize = EXT4_SB(inode->i_sb)->s_es->s_min_extra_isize;
+
+ down_write(&EXT4_I(inode)->xattr_sem);
+
+retry:
+ if (EXT4_I(inode)->i_extra_isize >= new_extra_isize) {
+ up_write(&EXT4_I(inode)->xattr_sem);
+ return 0;
+ }
+
+ raw_inode = ext4_raw_inode(&iloc);
+
+ header = IHDR(inode, raw_inode);
+ entry = IFIRST(header);
+
+ /* No extended attributes present */
+ if (!(EXT4_I(inode)->i_state & EXT4_STATE_XATTR) ||
+ header->h_magic != cpu_to_le32(EXT4_XATTR_MAGIC)) {
+ memset((void *)raw_inode + EXT4_GOOD_OLD_INODE_SIZE, 0,
+ new_extra_isize);
+ EXT4_I(inode)->i_extra_isize = new_extra_isize;
+ goto cleanup;
+ }
+
+ /*
+ * Check if enough free space is available in the inode to shift the
+ * entries ahead by new_extra_isize.
+ */
+
+ base = start = entry;
+ end = (void *)raw_inode + EXT4_SB(inode->i_sb)->s_inode_size;
+ min_offs = end - base;
+ last = entry;
+ total_ino = sizeof(struct ext4_xattr_ibody_header);
+
+ free = ext3_xattr_free_space(last, &min_offs, base, &total_ino);
+ if (free >= new_extra_isize) {
+ entry = IFIRST(header);
+ ext3_xattr_shift_entries(entry, EXT4_I(inode)->i_extra_isize
+ - new_extra_isize, (void *)raw_inode +
+ EXT4_GOOD_OLD_INODE_SIZE + new_extra_isize,
+ (void *)header, total_ino,
+ inode->i_sb->s_blocksize);
+ EXT4_I(inode)->i_extra_isize = new_extra_isize;
+ error = 0;
+ goto cleanup;
+ }
+
+ /*
+ * Enough free space isn't available in the inode, check if
+ * EA block can hold new_extra_isize bytes.
+ */
+ if (EXT4_I(inode)->i_file_acl) {
+ bh = sb_bread(inode->i_sb, EXT4_I(inode)->i_file_acl);
+ error = -EIO;
+ if (!bh)
+ goto cleanup;
+ if (ext4_xattr_check_block(bh)) {
+ ext4_error(inode->i_sb, __FUNCTION__,
+ "inode %lu: bad block %llu", inode->i_ino,
+ EXT4_I(inode)->i_file_acl);
+ error = -EIO;
+ goto cleanup;
+ }
+ base = BHDR(bh);
+ first = BFIRST(bh);
+ end = bh->b_data + bh->b_size;
+ min_offs = end - base;
+ free = ext3_xattr_free_space(first, &min_offs, base,
+ &total_blk);
+ if (free < new_extra_isize) {
+ if (!tried_min_extra_isize && s_min_extra_isize) {
+ tried_min_extra_isize++;
+ new_extra_isize = s_min_extra_isize;
+ goto retry;
+ }
+ error = -1;
+ goto cleanup;
+ }
+ }
+ else {
+ free = inode->i_sb->s_blocksize;
+ }
+
+ while (new_extra_isize > 0) {
+ size_t offs, size, entry_size;
+ struct ext4_xattr_entry *small_entry = NULL;
+ struct ext4_xattr_info i = {
+ .value = NULL,
+ .value_len = 0,
+ };
+ unsigned int total_size, shift_bytes, temp = ~0U;
+
+ is = (struct ext4_xattr_ibody_find *) kmalloc(sizeof(struct
+ ext4_xattr_ibody_find), GFP_KERNEL);
+ bs = (struct ext4_xattr_block_find *) kmalloc(sizeof(struct
+ ext4_xattr_block_find), GFP_KERNEL);
+ memset((void *)is, 0, sizeof(struct ext4_xattr_ibody_find));
+ memset((void *)bs, 0, sizeof(struct ext4_xattr_block_find));
+
+ is->s.not_found = bs->s.not_found = -ENODATA;
+ is->iloc.bh = NULL;
+ bs->bh = NULL;
+
+ last = IFIRST(header);
+ /* Find the entry best suited to be pushed into EA block */
+ entry = NULL;
+ for (; !IS_LAST_ENTRY(last); last = EXT4_XATTR_NEXT(last)) {
+ total_size = EXT4_XATTR_SIZE(le32_to_cpu(last->e_value_size)) +
+ EXT4_XATTR_LEN(last->e_name_len);
+ if (total_size <= free && total_size < temp) {
+ if (total_size < new_extra_isize) {
+ small_entry = last;
+ }
+ else {
+ entry = last;
+ temp = total_size;
+ }
+ }
+ }
+
+ if (entry == NULL) {
+ if (small_entry) {
+ entry = small_entry;
+ }
+ else {
+ if (!tried_min_extra_isize &&
+ s_min_extra_isize) {
+ tried_min_extra_isize++;
+ new_extra_isize = s_min_extra_isize;
+ goto retry;
+ }
+ error = -1;
+ goto cleanup;
+ }
+ }
+ offs = le16_to_cpu(entry->e_value_offs);
+ size = le32_to_cpu(entry->e_value_size);
+ entry_size = EXT4_XATTR_LEN(entry->e_name_len);
+ i.name_index = entry->e_name_index,
+ buffer = kmalloc(EXT4_XATTR_SIZE(size), GFP_KERNEL);
+ b_entry_name = kmalloc(entry->e_name_len + 1, GFP_KERNEL);
+ /* Save the entry name and the entry value */
+ memcpy((void *)buffer, (void *)IFIRST(header) + offs,
+ EXT4_XATTR_SIZE(size));
+ memcpy((void *)b_entry_name, (void *)entry->e_name,
+ entry->e_name_len);
+ b_entry_name[entry->e_name_len] = '\0';
+ i.name = b_entry_name;
+
+ error = ext4_get_inode_loc(inode, &is->iloc);
+ if (error)
+ goto cleanup;
+
+ error = ext4_xattr_ibody_find(inode, &i, is);
+ if (error)
+ goto cleanup;
+
+ /* Remove the chosen entry from the inode */
+ error = ext4_xattr_ibody_set(handle, inode, &i, is);
+
+ entry = IFIRST(header);
+ if (entry_size + EXT4_XATTR_SIZE(size) >= new_extra_isize)
+ shift_bytes = new_extra_isize;
+ else
+ shift_bytes = entry_size + size;
+ /* Adjust the offsets and shift the remaining entries ahead */
+ ext3_xattr_shift_entries(entry, EXT4_I(inode)->i_extra_isize -
+ shift_bytes, (void *)raw_inode +
+ EXT4_GOOD_OLD_INODE_SIZE + extra_isize + shift_bytes,
+ (void *)header, total_ino - entry_size,
+ inode->i_sb->s_blocksize);
+
+ extra_isize += shift_bytes;
+ new_extra_isize -= shift_bytes;
+ EXT4_I(inode)->i_extra_isize = extra_isize;
+
+ i.name = b_entry_name;
+ i.value = buffer;
+ i.value_len = cpu_to_le32(size);
+ error = ext4_xattr_block_find(inode, &i, bs);
+ if (error)
+ goto cleanup;
+
+ /* Add entry which was removed from the inode into the block */
+ error = ext4_xattr_block_set(handle, inode, &i, bs);
+ if (error)
+ goto cleanup;
+ }
+
+cleanup:
+ if (b_entry_name)
+ kfree(b_entry_name);
+ if (buffer)
+ kfree(buffer);
+ if (is) {
+ brelse(is->iloc.bh);
+ kfree(is);
+ }
+ if (bs)
+ kfree(bs);
+ brelse(bh);
+ up_write(&EXT4_I(inode)->xattr_sem);
+ return error;
+}
+
+
+
/*
* ext4_xattr_delete_inode()
*
Index: linux-2.6.22-rc2-ext4-1/fs/ext4/xattr.h
===================================================================
--- linux-2.6.22-rc2-ext4-1.orig/fs/ext4/xattr.h 2007-05-25 17:12:37.000000000 +0200
+++ linux-2.6.22-rc2-ext4-1/fs/ext4/xattr.h 2007-05-25 17:12:41.000000000 +0200
@@ -74,6 +74,9 @@
extern void ext4_xattr_delete_inode(handle_t *, struct inode *);
extern void ext4_xattr_put_super(struct super_block *);
+int ext4_expand_extra_isize(struct inode *inode, int new_extra_isize,
+ struct ext4_iloc iloc, handle_t *handle);
+
extern int init_ext4_xattr(void);
extern void exit_ext4_xattr(void);
[-- Attachment #3: i_version_update_ext4 --]
[-- Type: text/plain, Size: 1762 bytes --]
This patch is on top of i_version_update_vfs.
The i_version field of the inode is set on inode creation and incremented when
the inode is being modified.
Signed-off-by: Jean Noel Cordenner <jean-noel.cordenner@bull.net>
Index: linux-2.6.22-rc2-ext4-1/fs/ext4/ialloc.c
===================================================================
--- linux-2.6.22-rc2-ext4-1.orig/fs/ext4/ialloc.c 2007-05-25 18:05:28.000000000 +0200
+++ linux-2.6.22-rc2-ext4-1/fs/ext4/ialloc.c 2007-05-25 18:05:40.000000000 +0200
@@ -565,6 +565,7 @@
inode->i_blocks = 0;
inode->i_mtime = inode->i_atime = inode->i_ctime = ei->i_crtime =
ext4_current_time(inode);
+ inode->i_version = 1;
memset(ei->i_data, 0, sizeof(ei->i_data));
ei->i_dir_start_lookup = 0;
Index: linux-2.6.22-rc2-ext4-1/fs/ext4/inode.c
===================================================================
--- linux-2.6.22-rc2-ext4-1.orig/fs/ext4/inode.c 2007-05-25 18:05:28.000000000 +0200
+++ linux-2.6.22-rc2-ext4-1/fs/ext4/inode.c 2007-05-25 18:05:40.000000000 +0200
@@ -3082,6 +3082,7 @@
{
int err = 0;
+ inode->i_version++;
/* the do_update_inode consumes one bh->b_count */
get_bh(iloc->bh);
Index: linux-2.6.22-rc2-ext4-1/fs/ext4/super.c
===================================================================
--- linux-2.6.22-rc2-ext4-1.orig/fs/ext4/super.c 2007-05-25 18:05:28.000000000 +0200
+++ linux-2.6.22-rc2-ext4-1/fs/ext4/super.c 2007-05-25 18:05:40.000000000 +0200
@@ -2839,8 +2839,8 @@
i_size_write(inode, off+len-towrite);
EXT4_I(inode)->i_disksize = inode->i_size;
}
- inode->i_version++;
inode->i_mtime = inode->i_ctime = CURRENT_TIME;
+ inode->i_version = 1;
ext4_mark_inode_dirty(handle, inode);
mutex_unlock(&inode->i_mutex);
return len - towrite;
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: [patch 2/2] i_version update - ext4 part
2007-05-25 16:25 [patch 2/2] i_version update - ext4 part Jean noel Cordenner
@ 2007-05-29 19:44 ` Mingming Cao
2007-05-29 22:58 ` Andreas Dilger
0 siblings, 1 reply; 5+ messages in thread
From: Mingming Cao @ 2007-05-29 19:44 UTC (permalink / raw)
To: jean-noel.cordenner; +Cc: linux-ext4, nfsv4, linux-fsdevel
On Fri, 2007-05-25 at 18:25 +0200, Jean noel Cordenner wrote:
> The patch is on top of the ext4 tree:
> http://repo.or.cz/w/ext4-patch-queue.git
>
> In this part, the i_version counter is stored into 2 32bit fields of
> the ext4_inode structure osd1.linux1.l_i_version and i_version_hi.
>
> I included the ext4_expand_inode_extra_isize patch, which does part of
> the job, checking if there is enough room for extra fields in the inode
> (i_version_hi). The other patch increments the counter on inode
> modifications and set it on inode creation.
> plain text document attachment (i_version_update_ext4)
> This patch is on top of i_version_update_vfs.
> The i_version field of the inode is set on inode creation and incremented when
> the inode is being modified.
>
I am a little bit confused about the two patches.
It appears in the ext4_expand_inode_extra_isize patch by Kalpak, there a
new 64 bit i_fs_version field is added to ext4 inode structure for inode
versioning support. read/store of this counter are properly handled, but
missing the inode versioning counter update.
But later in the second patch by Jean Noel, he re-used the VFS inode-
>i_version for ext4 inode versioning, the counter is being updated every
time the file is being changed.
To me, i_fs_version and inode_version are the same thing, right?
Shouldn't we choose one(I assume inode i_version?), and combine these
two patch together? How about split the inode versioning part from the
ext4_expand_inode_extra_isize patch(it does multiple things, and
i_versioning doesn't longs there) and put it together with the rest of
i_version update patches?
BTW, how could NFS/user space to access the inode version counter?
Thanks,
Mingming
> Signed-off-by: Jean Noel Cordenner <jean-noel.cordenner@bull.net>
>
> Index: linux-2.6.22-rc2-ext4-1/fs/ext4/ialloc.c
> ===================================================================
> --- linux-2.6.22-rc2-ext4-1.orig/fs/ext4/ialloc.c 2007-05-25 18:05:28.000000000 +0200
> +++ linux-2.6.22-rc2-ext4-1/fs/ext4/ialloc.c 2007-05-25 18:05:40.000000000 +0200
> @@ -565,6 +565,7 @@
> inode->i_blocks = 0;
> inode->i_mtime = inode->i_atime = inode->i_ctime = ei->i_crtime =
> ext4_current_time(inode);
> + inode->i_version = 1;
>
> memset(ei->i_data, 0, sizeof(ei->i_data));
> ei->i_dir_start_lookup = 0;
> Index: linux-2.6.22-rc2-ext4-1/fs/ext4/inode.c
> ===================================================================
> --- linux-2.6.22-rc2-ext4-1.orig/fs/ext4/inode.c 2007-05-25 18:05:28.000000000 +0200
> +++ linux-2.6.22-rc2-ext4-1/fs/ext4/inode.c 2007-05-25 18:05:40.000000000 +0200
> @@ -3082,6 +3082,7 @@
> {
> int err = 0;
>
> + inode->i_version++;
> /* the do_update_inode consumes one bh->b_count */
> get_bh(iloc->bh);
>
> Index: linux-2.6.22-rc2-ext4-1/fs/ext4/super.c
> ===================================================================
> --- linux-2.6.22-rc2-ext4-1.orig/fs/ext4/super.c 2007-05-25 18:05:28.000000000 +0200
> +++ linux-2.6.22-rc2-ext4-1/fs/ext4/super.c 2007-05-25 18:05:40.000000000 +0200
> @@ -2839,8 +2839,8 @@
> i_size_write(inode, off+len-towrite);
> EXT4_I(inode)->i_disksize = inode->i_size;
> }
> - inode->i_version++;
> inode->i_mtime = inode->i_ctime = CURRENT_TIME;
> + inode->i_version = 1;
> ext4_mark_inode_dirty(handle, inode);
> mutex_unlock(&inode->i_mutex);
> return len - towrite;
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: [patch 2/2] i_version update - ext4 part
2007-05-29 19:44 ` Mingming Cao
@ 2007-05-29 22:58 ` Andreas Dilger
2007-05-30 23:48 ` Mingming Cao
0 siblings, 1 reply; 5+ messages in thread
From: Andreas Dilger @ 2007-05-29 22:58 UTC (permalink / raw)
To: Mingming Cao; +Cc: jean-noel.cordenner, linux-ext4, nfsv4, linux-fsdevel
On May 29, 2007 12:44 -0700, Mingming Cao wrote:
> I am a little bit confused about the two patches.
>
> It appears in the ext4_expand_inode_extra_isize patch by Kalpak, there a
> new 64 bit i_fs_version field is added to ext4 inode structure for inode
> versioning support. read/store of this counter are properly handled, but
> missing the inode versioning counter update.
For the Lustre use of the inode version we don't care about the VFS changes
to i_version. In fact - we want to be able to control the changes to
inode version ourselves so that e.g. file defragmenting or atime updates
don't change the inode version, and that recovery can restore the version
to a known state along with the rest of the metadata.
That said, since Lustre isn't in the kernel and we patch our version of
ext3 anyways it doesn't really matter what is done for NFS. We will just
patch in our own behaviour if the final ext4 code isn't suitable in all
of the details. Having 99% of the code the same at least makes this a
lot less work.
> But later in the second patch by Jean Noel, he re-used the VFS inode-
> >i_version for ext4 inode versioning, the counter is being updated every
> time the file is being changed.
I don't know what the NFS requirements for the version are. There may
also be some complaints from others if the i_version is 64 bits because
this contributes to generic inode growth and isn't used for other
filesystems.
> To me, i_fs_version and inode_version are the same thing, right?
> Shouldn't we choose one(I assume inode i_version?), and combine these
> two patch together? How about split the inode versioning part from the
> ext4_expand_inode_extra_isize patch(it does multiple things, and
> i_versioning doesn't longs there) and put it together with the rest of
> i_version update patches?
I don't have an objection to that, but I don't think it is required.
> BTW, how could NFS/user space to access the inode version counter?
If the Bull patch uses i_version then knfsd can just access it directly.
I don't think there is any API to access it from userspace. One option
is to add a virtual EA like user.inode_version and have the kernel fill
this in from i_version.
Lustre will manipulate the ei->i_fs_version directly.
Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [patch 2/2] i_version update - ext4 part
2007-05-29 22:58 ` Andreas Dilger
@ 2007-05-30 23:48 ` Mingming Cao
2007-05-31 2:25 ` Trond Myklebust
0 siblings, 1 reply; 5+ messages in thread
From: Mingming Cao @ 2007-05-30 23:48 UTC (permalink / raw)
To: Andreas Dilger; +Cc: jean-noel.cordenner, linux-ext4, nfsv4, linux-fsdevel
On Tue, 2007-05-29 at 16:58 -0600, Andreas Dilger wrote:
> On May 29, 2007 12:44 -0700, Mingming Cao wrote:
> > I am a little bit confused about the two patches.
> >
> > It appears in the ext4_expand_inode_extra_isize patch by Kalpak, there a
> > new 64 bit i_fs_version field is added to ext4 inode structure for inode
> > versioning support. read/store of this counter are properly handled, but
> > missing the inode versioning counter update.
>
> For the Lustre use of the inode version we don't care about the VFS changes
> to i_version. In fact - we want to be able to control the changes to
> inode version ourselves so that e.g. file defragmenting or atime updates
> don't change the inode version, and that recovery can restore the version
> to a known state along with the rest of the metadata.
>
It's a pity that VFS i_version can't serve Lustre need completely. :(
If the unnecessary inode version update is a concern, then, with current
implementation (i_version being updated in ext4_mark_inode_dirty()-
>ext4_mark_iloc_dirty()), the i_version can be increased multiple times
during one write() operation (unlike ctime update). I know doing the
update in ext4_mark_inode_dirty() (rather than update inode version on
every mtime/ctime update) clearly simplified the code changes. So I am
not sure if the increased update is a concern or not.
In any case, does the VFS inode version get update when atime updates?
> That said, since Lustre isn't in the kernel and we patch our version of
> ext3 anyways it doesn't really matter what is done for NFS. We will just
> patch in our own behaviour if the final ext4 code isn't suitable in all
> of the details. Having 99% of the code the same at least makes this a
> lot less work.
>
> > But later in the second patch by Jean Noel, he re-used the VFS inode-
> > >i_version for ext4 inode versioning, the counter is being updated every
> > time the file is being changed.
>
> I don't know what the NFS requirements for the version are. There may
> also be some complaints from others if the i_version is 64 bits because
> this contributes to generic inode growth and isn't used for other
> filesystems.
>
That should benefit for other filesystems, as I thought this NFS
requirements apply to all filesystems.
> > To me, i_fs_version and inode_version are the same thing, right?
> > Shouldn't we choose one(I assume inode i_version?), and combine these
> > two patch together? How about split the inode versioning part from the
> > ext4_expand_inode_extra_isize patch(it does multiple things, and
> > i_versioning doesn't longs there) and put it together with the rest of
> > i_version update patches?
>
> I don't have an objection to that, but I don't think it is required.
>
> > BTW, how could NFS/user space to access the inode version counter?
>
> If the Bull patch uses i_version then knfsd can just access it directly.
> I don't think there is any API to access it from userspace. One option
> is to add a virtual EA like user.inode_version and have the kernel fill
> this in from i_version.
>
> Lustre will manipulate the ei->i_fs_version directly.
>
> Cheers, Andreas
> --
> Andreas Dilger
> Principal Software Engineer
> Cluster File Systems, Inc.
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [patch 2/2] i_version update - ext4 part
2007-05-30 23:48 ` Mingming Cao
@ 2007-05-31 2:25 ` Trond Myklebust
0 siblings, 0 replies; 5+ messages in thread
From: Trond Myklebust @ 2007-05-31 2:25 UTC (permalink / raw)
To: cmm; +Cc: Andreas Dilger, jean-noel.cordenner, linux-ext4, nfsv4,
linux-fsdevel
On Wed, 2007-05-30 at 16:48 -0700, Mingming Cao wrote:
> On Tue, 2007-05-29 at 16:58 -0600, Andreas Dilger wrote:
> > I don't know what the NFS requirements for the version are. There may
> > also be some complaints from others if the i_version is 64 bits because
> > this contributes to generic inode growth and isn't used for other
> > filesystems.
> >
> That should benefit for other filesystems, as I thought this NFS
> requirements apply to all filesystems.
Right. The point here is that the NFS protocol needs to impose certain
requirements on _all_ filesystems that want to be supported, and so it
is in the interest of everyone to have a generic update mechanism
available in the VFS in order to avoid code (and bug) replication.
Now if Lustre doesn't care about NFS compatibility, then I suppose it
would be fairly easy to engineer the i_version update interface to allow
them to use that field in whatever way suits them best.
Trond
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2007-05-31 2:25 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-05-25 16:25 [patch 2/2] i_version update - ext4 part Jean noel Cordenner
2007-05-29 19:44 ` Mingming Cao
2007-05-29 22:58 ` Andreas Dilger
2007-05-30 23:48 ` Mingming Cao
2007-05-31 2:25 ` Trond Myklebust
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).