* [Ocfs2-devel] ext2 attributes for OCFS2
@ 2006-06-10 12:32 Herbert Poetzl
2006-06-11 17:45 ` Mark Fasheh
0 siblings, 1 reply; 7+ messages in thread
From: Herbert Poetzl @ 2006-06-10 12:32 UTC (permalink / raw)
To: ocfs2-devel
Hey Mark! Hey Kurt!
did a first hack on adding the iattrs (ext2)
to OCFS2. it is currently for Linux-VServer
patched kernels and I need to clean it up a
little more (and of course test it), but I
thought I'd ask you folks what you think about
the general idea and the implementation.
here is a link to the patch, if there is some
interest from your side, I'll prepare one for
mainline ...
http://vserver.13thfloor.at/Experimental/delta-ocfs2-feat01.diff
note: some attributes (like the noatime) do
not (yet) make sense for OCFS2, It's just a
generic approach ...
please ignore the ocfs2_sync_flags, it's only
used in Linux-VServer for now ...
TIA,
Herbert
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Ocfs2-devel] ext2 attributes for OCFS2
2006-06-10 12:32 [Ocfs2-devel] ext2 attributes for OCFS2 Herbert Poetzl
@ 2006-06-11 17:45 ` Mark Fasheh
2006-06-12 21:41 ` Herbert Poetzl
0 siblings, 1 reply; 7+ messages in thread
From: Mark Fasheh @ 2006-06-11 17:45 UTC (permalink / raw)
To: ocfs2-devel
Hey Herbert,
On Sat, Jun 10, 2006 at 02:32:53PM +0200, Herbert Poetzl wrote:
> did a first hack on adding the iattrs (ext2) to OCFS2. it is currently for
> Linux-VServer patched kernels and I need to clean it up a little more (and
> of course test it), but I thought I'd ask you folks what you think about
> the general idea and the implementation.
Well it definitely looks like things are going in a good direction :) How
many of these attributes were you planning to implement by the way? I'm
thinking we probably want to cover at least some of the basics
- immutable, appends... Maybe the vfs already handles those for us - I'll
have to read more code to figure that out. Patch comments below.
> here is a link to the patch, if there is some
> interest from your side, I'll prepare one for
> mainline ...
>
> http://vserver.13thfloor.at/Experimental/delta-ocfs2-feat01.diff
You probably want to move the cluster lock in the OCFS2_IOC_SETFLAGS case up
a bit so that you can be sure your inode image is up to date within the
cluster. Similarly, you'll probably want to take a read-only meta data lock
in the OCFS2_IOC_GETFLAGS case.
OCFS2 makes use of a small area associated with locks (called a Lock Value
Block) to stuff commonly used inode information. This helps the file system
avoid a disk read when updating inode information due to a new lock acquiry
- ocfs2_meta_lock_update() is the function responsible for this. These new
attributes should be put in the lvb as we'll want to be able to test their
existence by checking the struct inode under cluster lock (as opposed to
having to read off disk and checking the disk inode flags). Luckily this is
pretty easy - you already handled the disk cases by updating
ocfs2_populate_inode() and ocfs2_refresh_inode(). The lvb is a similar
process. I'll outline the steps below.
At the top of dlmglue.h you'll see the ocfs2_meta_lvb structure definition.
Just go ahead and take some of the reserved fields for our purposes.
Be sure to bump the value of OCFS2_LVB_VERSION so that node which don't
understand the new field know just to ignore the lvb.
The next step is to update __ocfs2_stuff_meta_lvb() to understand how to
write the values into the lvb, and ocfs2_refresh_inode_from_lvb() to take
them from the lvb into the inode.
We only need to stuff these new flags into the lvb btw, so don't worry about
the other ocfs2_dinode->i_flags - they're not nearly as dynamic and aren't
really checked anywhere we won't already be using the disk.
I'm thinking about whether we want to mark the existence of this feature in
the superblock (probably as a compat flag). More on this later I suppose.
> note: some attributes (like the noatime) do not (yet) make sense for
> OCFS2, It's just a generic approach ...
Right. What's up with the OCFS2_BARRIER_FL and OCFS2_IUNLINK flags? Am
I correct when I don't notice those in ext3?
Thanks!
--Mark
--
Mark Fasheh
Senior Software Developer, Oracle
mark.fasheh at oracle.com
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Ocfs2-devel] ext2 attributes for OCFS2
2006-06-11 17:45 ` Mark Fasheh
@ 2006-06-12 21:41 ` Herbert Poetzl
2006-06-13 23:56 ` Mark Fasheh
0 siblings, 1 reply; 7+ messages in thread
From: Herbert Poetzl @ 2006-06-12 21:41 UTC (permalink / raw)
To: ocfs2-devel
On Sun, Jun 11, 2006 at 10:45:10AM -0700, Mark Fasheh wrote:
> Hey Herbert,
>
> On Sat, Jun 10, 2006 at 02:32:53PM +0200, Herbert Poetzl wrote:
> > did a first hack on adding the iattrs (ext2) to OCFS2. it is
> > currently for Linux-VServer patched kernels and I need to clean it
> > up a little more (and of course test it), but I thought I'd ask you
> > folks what you think about the general idea and the implementation.
> Well it definitely looks like things are going in a good direction :)
okay, so it looks like the ocfs2 folks are open to such
an extention which for me means that a mainline patch
makes sense ...
> How many of these attributes were you planning to implement by the
> way?
the good news here is that some of the attributes (like
the immutable or append) work out of the box once they
are there (i.e. present in the generic inode flags)
> I'm thinking we probably want to cover at least some of the basics -
> immutable, appends... Maybe the vfs already handles those for us -
> I'll have to read more code to figure that out. Patch comments below.
>
> > here is a link to the patch, if there is some
> > interest from your side, I'll prepare one for
> > mainline ...
> >
> > http://vserver.13thfloor.at/Experimental/delta-ocfs2-feat01.diff
> You probably want to move the cluster lock in the OCFS2_IOC_SETFLAGS
> case up a bit so that you can be sure your inode image is up to
> date within the cluster. Similarly, you'll probably want to take a
> read-only meta data lock in the OCFS2_IOC_GETFLAGS case.
okay, I moved those parts out of the ocfs2_ioctl
to match the general structure and enhance the
readabilitly - I hope I got the locks right this
time, please double check ...
> OCFS2 makes use of a small area associated with locks (called a Lock
> Value Block) to stuff commonly used inode information. This helps
> the file system avoid a disk read when updating inode information
> due to a new lock acquiry - ocfs2_meta_lock_update() is the function
> responsible for this. These new attributes should be put in the lvb as
> we'll want to be able to test their existence by checking the struct
> inode under cluster lock (as opposed to having to read off disk and
> checking the disk inode flags). Luckily this is pretty easy - you
> already handled the disk cases by updating ocfs2_populate_inode() and
> ocfs2_refresh_inode(). The lvb is a similar process. I'll outline the
> steps below.
>
> At the top of dlmglue.h you'll see the ocfs2_meta_lvb structure
> definition. Just go ahead and take some of the reserved fields for our
> purposes. Be sure to bump the value of OCFS2_LVB_VERSION so that node
> which don't understand the new field know just to ignore the lvb.
>
> The next step is to update __ocfs2_stuff_meta_lvb() to understand how
> to write the values into the lvb, and ocfs2_refresh_inode_from_lvb()
> to take them from the lvb into the inode.
>
> We only need to stuff these new flags into the lvb btw, so don't worry
> about the other ocfs2_dinode->i_flags - they're not nearly as dynamic
> and aren't really checked anywhere we won't already be using the disk.
okay, I added this and hopefully it will work as
expected, it didn't get exercised on my (up to now)
single node test setup :)
> I'm thinking about whether we want to mark the existence of this
> feature in the superblock (probably as a compat flag). More on this
> later I suppose.
I decided to make this patch a little more agressive
than the last one, using a reserved field of the
on disk structure to store the new attributes, which
actually has two advantages IMHO:
- the mapping between ext2/3 and ocfs2 isn't required
anymore, as we an store the ext attrs without fear
of collisions with existing ocfs2 flags
- the mixing and separation of ocfs2 flags and ext
flags can be avoided completely, probably enhancing
readability while simplifying code
feel free to oject here, in which case I will add back
the mixing/separation code ...
> > note: some attributes (like the noatime) do not (yet) make sense for
> > OCFS2, It's just a generic approach ...
> Right. What's up with the OCFS2_BARRIER_FL and OCFS2_IUNLINK flags?
ah, I forgot to mention, those are Linux-VServer
specific too, so they do not apply for mainline
(but they are the primary reason for adding those
attributes to ocfs2 :)
> Am I correct when I don't notice those in ext3?
yup, they are present in Linux-VServer though :)
> Thanks!
> --Mark
I also added the ioctl.h to match the ocfs2 coding
style more closely ... enough words now, here is
the new patch, this time inline for easier review
and comments ...
TIA,
Herbert
diff -NurpP --minimal linux-2.6.16.20/fs/ocfs2/Makefile linux-2.6.16.20-ocfs2-0.01/fs/ocfs2/Makefile
--- linux-2.6.16.20/fs/ocfs2/Makefile 2006-01-18 06:08:34 +0100
+++ linux-2.6.16.20-ocfs2-0.01/fs/ocfs2/Makefile 2006-06-12 20:28:07 +0200
@@ -16,6 +16,7 @@ ocfs2-objs := \
file.o \
heartbeat.o \
inode.o \
+ ioctl.o \
journal.o \
localalloc.o \
mmap.o \
diff -NurpP --minimal linux-2.6.16.20/fs/ocfs2/dlmglue.c linux-2.6.16.20-ocfs2-0.01/fs/ocfs2/dlmglue.c
--- linux-2.6.16.20/fs/ocfs2/dlmglue.c 2006-01-18 06:08:34 +0100
+++ linux-2.6.16.20-ocfs2-0.01/fs/ocfs2/dlmglue.c 2006-06-12 22:32:00 +0200
@@ -1321,6 +1321,7 @@ static void __ocfs2_stuff_meta_lvb(struc
lvb->lvb_version = cpu_to_be32(OCFS2_LVB_VERSION);
lvb->lvb_isize = cpu_to_be64(i_size_read(inode));
lvb->lvb_iclusters = cpu_to_be32(oi->ip_clusters);
+ lvb->lvb_iflags = cpu_to_be32(oi->ip_iflags);
lvb->lvb_iuid = cpu_to_be32(inode->i_uid);
lvb->lvb_igid = cpu_to_be32(inode->i_gid);
lvb->lvb_imode = cpu_to_be16(inode->i_mode);
@@ -1349,12 +1351,14 @@ static void ocfs2_refresh_inode_from_lvb
struct ocfs2_inode_info *oi = OCFS2_I(inode);
struct ocfs2_lock_res *lockres = &oi->ip_meta_lockres;
struct ocfs2_meta_lvb *lvb;
+ int version;
mlog_entry_void();
mlog_meta_lvb(0, lockres);
lvb = (struct ocfs2_meta_lvb *) lockres->l_lksb.lvb;
+ version = be32_to_cpu(lvb->lvb_version);
/* We're safe here without the lockres lock... */
spin_lock(&oi->ip_lock);
@@ -1368,6 +1372,16 @@ static void ocfs2_refresh_inode_from_lvb
inode->i_blocks =
ocfs2_align_bytes_to_sectors(i_size_read(inode));
+ switch (version) {
+ case 3:
+ oi->ip_iflags = be32_to_cpu(lvb->lvb_iflags);
+ printk("lvb refresh for %p flags %x\n", inode, oi->ip_iflags);
+ break;
+ default:
+ oi->ip_iflags = 0;
+ printk("lvb refresh for %p flags = 0\n", inode);
+ }
+
inode->i_uid = be32_to_cpu(lvb->lvb_iuid);
inode->i_gid = be32_to_cpu(lvb->lvb_igid);
inode->i_mode = be16_to_cpu(lvb->lvb_imode);
diff -NurpP --minimal linux-2.6.16.20/fs/ocfs2/dlmglue.h linux-2.6.16.20-ocfs2-0.01/fs/ocfs2/dlmglue.h
--- linux-2.6.16.20/fs/ocfs2/dlmglue.h 2006-01-18 06:08:34 +0100
+++ linux-2.6.16.20-ocfs2-0.01/fs/ocfs2/dlmglue.h 2006-06-12 20:26:51 +0200
@@ -27,7 +27,7 @@
#ifndef DLMGLUE_H
#define DLMGLUE_H
-#define OCFS2_LVB_VERSION 2
+#define OCFS2_LVB_VERSION 3
struct ocfs2_meta_lvb {
__be32 lvb_version;
@@ -40,7 +40,8 @@ struct ocfs2_meta_lvb {
__be64 lvb_isize;
__be16 lvb_imode;
__be16 lvb_inlink;
- __be32 lvb_reserved[3];
+ __be32 lvb_iflags;
+ __be32 lvb_reserved[2];
};
/* ocfs2_meta_lock_full() and ocfs2_data_lock_full() 'arg_flags' flags */
diff -NurpP --minimal linux-2.6.16.20/fs/ocfs2/file.c linux-2.6.16.20-ocfs2-0.01/fs/ocfs2/file.c
--- linux-2.6.16.20/fs/ocfs2/file.c 2006-03-20 17:33:12 +0100
+++ linux-2.6.16.20-ocfs2-0.01/fs/ocfs2/file.c 2006-06-12 21:18:13 +0200
@@ -44,6 +44,7 @@
#include "file.h"
#include "sysfile.h"
#include "inode.h"
+#include "ioctl.h"
#include "journal.h"
#include "mmap.h"
#include "suballoc.h"
@@ -1182,10 +1183,12 @@ struct file_operations ocfs2_fops = {
.open = ocfs2_file_open,
.aio_read = ocfs2_file_aio_read,
.aio_write = ocfs2_file_aio_write,
+ .ioctl = ocfs2_ioctl,
};
struct file_operations ocfs2_dops = {
.read = generic_read_dir,
.readdir = ocfs2_readdir,
.fsync = ocfs2_sync_file,
+ .ioctl = ocfs2_ioctl,
};
diff -NurpP --minimal linux-2.6.16.20/fs/ocfs2/inode.c linux-2.6.16.20-ocfs2-0.01/fs/ocfs2/inode.c
--- linux-2.6.16.20/fs/ocfs2/inode.c 2006-03-20 17:33:12 +0100
+++ linux-2.6.16.20-ocfs2-0.01/fs/ocfs2/inode.c 2006-06-12 22:52:59 +0200
@@ -71,6 +71,26 @@ static int ocfs2_truncate_for_delete(str
struct inode *inode,
struct buffer_head *fe_bh);
+void ocfs2_set_inode_flags(struct inode *inode)
+{
+ unsigned int flags = OCFS2_I(inode)->ip_iflags;
+
+ inode->i_flags &= ~(S_IMMUTABLE |
+ S_SYNC | S_APPEND | S_NOATIME | S_DIRSYNC);
+
+ if (flags & OCFS2_IMMUTABLE_FL)
+ inode->i_flags |= S_IMMUTABLE;
+
+ if (flags & OCFS2_SYNC_FL)
+ inode->i_flags |= S_SYNC;
+ if (flags & OCFS2_APPEND_FL)
+ inode->i_flags |= S_APPEND;
+ if (flags & OCFS2_NOATIME_FL)
+ inode->i_flags |= S_NOATIME;
+ if (flags & OCFS2_DIRSYNC_FL)
+ inode->i_flags |= S_DIRSYNC;
+}
+
struct inode *ocfs2_ilookup_for_vote(struct ocfs2_super *osb,
u64 blkno,
int delete_vote)
@@ -258,7 +278,6 @@ int ocfs2_populate_inode(struct inode *i
inode->i_blocks =
ocfs2_align_bytes_to_sectors(le64_to_cpu(fe->i_size));
inode->i_mapping->a_ops = &ocfs2_aops;
- inode->i_flags |= S_NOATIME;
inode->i_atime.tv_sec = le64_to_cpu(fe->i_atime);
inode->i_atime.tv_nsec = le32_to_cpu(fe->i_atime_nsec);
inode->i_mtime.tv_sec = le64_to_cpu(fe->i_mtime);
@@ -273,6 +292,7 @@ int ocfs2_populate_inode(struct inode *i
OCFS2_I(inode)->ip_clusters = le32_to_cpu(fe->i_clusters);
OCFS2_I(inode)->ip_orphaned_slot = OCFS2_INVALID_SLOT;
+ OCFS2_I(inode)->ip_iflags = le32_to_cpu(fe->i_iflags);
if (create_ino)
inode->i_ino = ino_from_blkno(inode->i_sb,
@@ -326,7 +346,8 @@ int ocfs2_populate_inode(struct inode *i
OCFS2_LOCK_TYPE_META, inode);
ocfs2_inode_lock_res_init(&OCFS2_I(inode)->ip_data_lockres,
OCFS2_LOCK_TYPE_DATA, inode);
-
+ ocfs2_set_inode_flags(inode);
+ inode->i_flags |= S_NOATIME;
status = 0;
bail:
mlog_exit(status);
@@ -1125,6 +1145,7 @@ int ocfs2_mark_inode_dirty(struct ocfs2_
spin_lock(&OCFS2_I(inode)->ip_lock);
fe->i_clusters = cpu_to_le32(OCFS2_I(inode)->ip_clusters);
+ fe->i_iflags = cpu_to_le32(OCFS2_I(inode)->ip_iflags);
spin_unlock(&OCFS2_I(inode)->ip_lock);
fe->i_size = cpu_to_le64(i_size_read(inode));
@@ -1163,6 +1184,9 @@ void ocfs2_refresh_inode(struct inode *i
spin_lock(&OCFS2_I(inode)->ip_lock);
OCFS2_I(inode)->ip_clusters = le32_to_cpu(fe->i_clusters);
+ OCFS2_I(inode)->ip_iflags = le32_to_cpu(fe->i_iflags);
+ ocfs2_set_inode_flags(inode);
+
i_size_write(inode, le64_to_cpu(fe->i_size));
inode->i_nlink = le16_to_cpu(fe->i_links_count);
inode->i_uid = le32_to_cpu(fe->i_uid);
diff -NurpP --minimal linux-2.6.16.20/fs/ocfs2/inode.h linux-2.6.16.20-ocfs2-0.01/fs/ocfs2/inode.h
--- linux-2.6.16.20/fs/ocfs2/inode.h 2006-04-09 13:49:54 +0200
+++ linux-2.6.16.20-ocfs2-0.01/fs/ocfs2/inode.h 2006-06-12 21:08:39 +0200
@@ -56,6 +56,7 @@ struct ocfs2_inode_info
struct ocfs2_journal_handle *ip_handle;
u32 ip_flags; /* see below */
+ u32 ip_iflags; /* inode attributes */
/* protected by recovery_lock. */
struct inode *ip_next_orphan;
@@ -142,4 +143,6 @@ int ocfs2_mark_inode_dirty(struct ocfs2_
int ocfs2_aio_read(struct file *file, struct kiocb *req, struct iocb *iocb);
int ocfs2_aio_write(struct file *file, struct kiocb *req, struct iocb *iocb);
+void ocfs2_set_inode_flags(struct inode *inode);
+
#endif /* OCFS2_INODE_H */
diff -NurpP --minimal linux-2.6.16.20/fs/ocfs2/ioctl.c linux-2.6.16.20-ocfs2-0.01/fs/ocfs2/ioctl.c
--- linux-2.6.16.20/fs/ocfs2/ioctl.c 1970-01-01 01:00:00 +0100
+++ linux-2.6.16.20-ocfs2-0.01/fs/ocfs2/ioctl.c 2006-06-12 22:58:50 +0200
@@ -0,0 +1,130 @@
+/*
+ * linux/fs/ocfs2/ioctl.c
+ *
+ * Copyright (C) 2006 Herbert Poetzl
+ * adapted from Remy Card's ext2/ioctl.c
+ */
+
+#include <linux/fs.h>
+#include <linux/mount.h>
+
+#define MLOG_MASK_PREFIX ML_INODE
+#include <cluster/masklog.h>
+
+#include "ocfs2.h"
+#include "alloc.h"
+#include "dlmglue.h"
+#include "inode.h"
+#include "journal.h"
+
+#include "ocfs2_fs.h"
+#include <linux/ext2_fs.h>
+
+int ocfs2_get_iflags(struct inode *inode, unsigned *flags)
+{
+ int status;
+
+ status = ocfs2_meta_lock(inode, NULL, NULL, 0);
+ if (status < 0) {
+ mlog_errno(status);
+ return status;
+ }
+ *flags = OCFS2_I(inode)->ip_iflags;
+ ocfs2_meta_unlock(inode, 0);
+
+ mlog_exit(status);
+ return status;
+}
+
+int ocfs2_set_iflags(struct inode *inode, unsigned flags, unsigned mask)
+{
+ struct ocfs2_inode_info *ocfs2_inode = OCFS2_I(inode);
+ struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
+ struct ocfs2_journal_handle *handle = NULL;
+ struct buffer_head *bh = NULL;
+ unsigned oldflags;
+ int status;
+
+ status = ocfs2_meta_lock(inode, NULL, &bh, 1);
+ if (status < 0) {
+ mlog_errno(status);
+ goto bail;
+ }
+
+ status = -EROFS;
+ if (IS_RDONLY(inode))
+ goto bail_unlock;
+
+ status = -EACCES;
+ if ((current->fsuid != inode->i_uid) && !capable(CAP_FOWNER))
+ goto bail_unlock;
+
+ if (!S_ISDIR(inode->i_mode))
+ flags &= ~OCFS2_DIRSYNC_FL;
+
+ handle = ocfs2_start_trans(osb, NULL, OCFS2_INODE_UPDATE_CREDITS);
+ if (IS_ERR(handle)) {
+ status = PTR_ERR(handle);
+ mlog_errno(status);
+ goto bail_unlock;
+ }
+
+ oldflags = ocfs2_inode->ip_iflags;
+ flags = flags & mask;
+ flags |= oldflags & ~mask;
+
+ /*
+ * The IMMUTABLE and APPEND_ONLY flags can only be changed by
+ * the relevant capability.
+ */
+ status = -EPERM;
+ if ((oldflags & OCFS2_IMMUTABLE_FL) || ((flags ^ oldflags) &
+ (OCFS2_APPEND_FL | OCFS2_IMMUTABLE_FL))) {
+ if (!capable(CAP_LINUX_IMMUTABLE))
+ goto bail_unlock;
+ }
+
+ ocfs2_inode->ip_iflags = flags;
+ ocfs2_set_inode_flags(inode);
+
+ status = ocfs2_mark_inode_dirty(handle, inode, bh);
+ if (status < 0)
+ mlog_errno(status);
+
+ ocfs2_commit_trans(handle);
+bail_unlock:
+ ocfs2_meta_unlock(inode, 1);
+bail:
+ if (bh)
+ brelse(bh);
+
+ mlog_exit(status);
+ return status;
+}
+
+
+int ocfs2_ioctl(struct inode * inode, struct file * filp,
+ unsigned int cmd, unsigned long arg)
+{
+ unsigned int flags;
+ int status;
+
+ switch (cmd) {
+ case OCFS2_IOC_GETFLAGS:
+ status = ocfs2_get_iflags(inode, &flags);
+ if (status < 0)
+ return status;
+
+ flags &= OCFS2_FL_VISIBLE;
+ return put_user(flags, (int __user *) arg);
+ case OCFS2_IOC_SETFLAGS:
+ if (get_user(flags, (int __user *) arg))
+ return -EFAULT;
+
+ return ocfs2_set_iflags(inode, flags,
+ OCFS2_FL_MODIFIABLE);
+ default:
+ return -ENOTTY;
+ }
+}
+
diff -NurpP --minimal linux-2.6.16.20/fs/ocfs2/ioctl.h linux-2.6.16.20-ocfs2-0.01/fs/ocfs2/ioctl.h
--- linux-2.6.16.20/fs/ocfs2/ioctl.h 1970-01-01 01:00:00 +0100
+++ linux-2.6.16.20-ocfs2-0.01/fs/ocfs2/ioctl.h 2006-06-12 21:17:50 +0200
@@ -0,0 +1,16 @@
+/*
+ * ioctl.h
+ *
+ * Function prototypes
+ *
+ * Copyright (C) 2006 Herbert Poetzl
+ *
+ */
+
+#ifndef OCFS2_IOCTL_H
+#define OCFS2_IOCTL_H
+
+int ocfs2_ioctl(struct inode * inode, struct file * filp,
+ unsigned int cmd, unsigned long arg);
+
+#endif /* OCFS2_IOCTL_H */
diff -NurpP --minimal linux-2.6.16.20/fs/ocfs2/ocfs2_fs.h linux-2.6.16.20-ocfs2-0.01/fs/ocfs2/ocfs2_fs.h
--- linux-2.6.16.20/fs/ocfs2/ocfs2_fs.h 2006-04-09 13:49:54 +0200
+++ linux-2.6.16.20-ocfs2-0.01/fs/ocfs2/ocfs2_fs.h 2006-06-12 21:28:30 +0200
@@ -114,6 +114,26 @@
#define OCFS2_CHAIN_FL (0x00000400) /* Chain allocator */
#define OCFS2_DEALLOC_FL (0x00000800) /* Truncate log */
+/* Inode attributes, keep in sync with EXT2 */
+#define OCFS2_SECRM_FL (0x00000001) /* Secure deletion */
+#define OCFS2_UNRM_FL (0x00000002) /* Undelete */
+#define OCFS2_COMPR_FL (0x00000004) /* Compress file */
+#define OCFS2_SYNC_FL (0x00000008) /* Synchronous updates */
+#define OCFS2_IMMUTABLE_FL (0x00000010) /* Immutable file */
+#define OCFS2_APPEND_FL (0x00000020) /* writes to file may only append */
+#define OCFS2_NODUMP_FL (0x00000040) /* do not dump file */
+#define OCFS2_NOATIME_FL (0x00000080) /* do not update atime */
+#define OCFS2_DIRSYNC_FL (0x00010000) /* dirsync behaviour (directories only) */
+
+#define OCFS2_FL_VISIBLE (0x000100FF) /* User visible flags */
+#define OCFS2_FL_MODIFIABLE (0x000100FF) /* User modifiable flags */
+
+/*
+ * ioctl commands
+ */
+#define OCFS2_IOC_GETFLAGS _IOR('f', 1, long)
+#define OCFS2_IOC_SETFLAGS _IOW('f', 2, long)
+
/*
* Journal Flags (ocfs2_dinode.id1.journal1.i_flags)
*/
@@ -399,7 +419,9 @@ struct ocfs2_dinode {
__le32 i_atime_nsec;
__le32 i_ctime_nsec;
__le32 i_mtime_nsec;
-/*70*/ __le64 i_reserved1[9];
+/*70*/ __le32 i_iflags; /* inode attr flags */
+ __le32 i_reserved1;
+ __le64 i_reserved2[8];
/*B8*/ union {
__le64 i_pad1; /* Generic way to refer to this
64bit union */
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Ocfs2-devel] ext2 attributes for OCFS2
2006-06-12 21:41 ` Herbert Poetzl
@ 2006-06-13 23:56 ` Mark Fasheh
2006-06-14 10:44 ` Herbert Poetzl
0 siblings, 1 reply; 7+ messages in thread
From: Mark Fasheh @ 2006-06-13 23:56 UTC (permalink / raw)
To: ocfs2-devel
On Mon, Jun 12, 2006 at 11:41:17PM +0200, Herbert Poetzl wrote:
> okay, so it looks like the ocfs2 folks are open to such
> an extention which for me means that a mainline patch
> makes sense ...
Yes - thanks for following up on this :)
> the good news here is that some of the attributes (like
> the immutable or append) work out of the box once they
> are there (i.e. present in the generic inode flags)
Yeah, it should mostly work just fine, but I'm betting there are some paths
which will need a manual re-check as things may have changed out from under
the VFS while we weren't holding a cluster lock. It should be easy enough to
check things like the immutable flag in those places though.
> okay, I moved those parts out of the ocfs2_ioctl
> to match the general structure and enhance the
> readabilitly - I hope I got the locks right this
> time, please double check ...
The locking looks good, thanks. The ioctl.[ch] parts of the patch looks
pretty good now btw.
> okay, I added this and hopefully it will work as
> expected, it didn't get exercised on my (up to now)
> single node test setup :)
Cool. As much testing as you can do is good :) At any rate, we'll certainly
be testing things out over here once the patch is good for inclusion.
There is a small problem with the lvb stuff which I'll outline below.
> I decided to make this patch a little more agressive
> than the last one, using a reserved field of the
> on disk structure to store the new attributes, which
> actually has two advantages IMHO:
>
> - the mapping between ext2/3 and ocfs2 isn't required
> anymore, as we an store the ext attrs without fear
> of collisions with existing ocfs2 flags
>
> - the mixing and separation of ocfs2 flags and ext
> flags can be avoided completely, probably enhancing
> readability while simplifying code
I agree that using a new field seems nicer. Can we compress things though
and just use a u16? I don't mind the translation that we were doing before,
so you can go ahead and pack the actual bit values for OCFS2's purposes.
Also I would prefer if you could split the i_reserved0 __le32 into a pair of
__le16 values:
-/*10*/ __le32 i_reserved0;
+/*10*/ __le16 i_attr_flags;
+ __le16 i_reserved0;
Actually, looking at the values, it seems that OCFS2_DIRSYNC_FL is the only
one that won't fit within 16 bits, so you could just give that one some new
value and we'll only have to translate on one flag :)
> here is the new patch, this time inline for easier review and comments ...
Great, thanks - more comments inline.
> @@ -1368,6 +1372,16 @@ static void ocfs2_refresh_inode_from_lvb
> inode->i_blocks =
> ocfs2_align_bytes_to_sectors(i_size_read(inode));
>
> + switch (version) {
> + case 3:
> + oi->ip_iflags = be32_to_cpu(lvb->lvb_iflags);
> + printk("lvb refresh for %p flags %x\n", inode, oi->ip_iflags);
> + break;
> + default:
> + oi->ip_iflags = 0;
> + printk("lvb refresh for %p flags = 0\n", inode);
> + }
You won't ever hit the 'default' case because ocfs2_meta_lvb_is_trustable()
will always return false. Which is what we probably want in this case
because if you're going to allow nodes which don't understand the attributes
to be in the cluster we don't want to trust their lvbs (and instead read
from disk) as they won't have put anything useful in the flags lvb field.
> inode->i_uid = be32_to_cpu(lvb->lvb_iuid);
> inode->i_gid = be32_to_cpu(lvb->lvb_igid);
> inode->i_mode = be16_to_cpu(lvb->lvb_imode);
Hmm, you also left out the corresponding code in __ocfs2_stuff_meta_lvb()
I'll attach a quick, untested patch against dlmglue.[ch] which you can
hopefully use.
> diff -NurpP --minimal linux-2.6.16.20/fs/ocfs2/inode.h linux-2.6.16.20-ocfs2-0.01/fs/ocfs2/inode.h
> --- linux-2.6.16.20/fs/ocfs2/inode.h 2006-04-09 13:49:54 +0200
> +++ linux-2.6.16.20-ocfs2-0.01/fs/ocfs2/inode.h 2006-06-12 21:08:39 +0200
> @@ -56,6 +56,7 @@ struct ocfs2_inode_info
> struct ocfs2_journal_handle *ip_handle;
>
> u32 ip_flags; /* see below */
> + u32 ip_iflags; /* inode attributes */
Minor nit, but would you mind changing the name to something like
ip_attr_flags? I know - I'm the one who brought up that not-so-good-name in
the first place :) Maybe we can just start saying attr_flags instead of
iflags elsewhere too? I apologize for the poor 1st pass at naming this thing
:)
> -/*70*/ __le64 i_reserved1[9];
> +/*70*/ __le32 i_iflags; /* inode attr flags */
> + __le32 i_reserved1;
> + __le64 i_reserved2[8];
I guess just see my comment above on where to locate this.
So IMHO, and you can probably tell this by the lack of patch comments that
things are coming along pretty well. Thanks again for doing all this :)
--Mark
--
Mark Fasheh
Senior Software Developer, Oracle
mark.fasheh at oracle.com
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Ocfs2-devel] ext2 attributes for OCFS2
2006-06-13 23:56 ` Mark Fasheh
@ 2006-06-14 10:44 ` Herbert Poetzl
2006-06-14 18:54 ` Mark Fasheh
0 siblings, 1 reply; 7+ messages in thread
From: Herbert Poetzl @ 2006-06-14 10:44 UTC (permalink / raw)
To: ocfs2-devel
On Tue, Jun 13, 2006 at 04:56:25PM -0700, Mark Fasheh wrote:
> On Mon, Jun 12, 2006 at 11:41:17PM +0200, Herbert Poetzl wrote:
> > okay, so it looks like the ocfs2 folks are open to such
> > an extention which for me means that a mainline patch
> > makes sense ...
> Yes - thanks for following up on this :)
always good to hear ...
> > the good news here is that some of the attributes (like
> > the immutable or append) work out of the box once they
> > are there (i.e. present in the generic inode flags)
> Yeah, it should mostly work just fine, but I'm betting there are some
> paths which will need a manual re-check as things may have changed out
> from under the VFS while we weren't holding a cluster lock. It should
> be easy enough to check things like the immutable flag in those places
> though.
might be, I already wondered why things like
mark_inode_dirty() do not work with ocfs2,
IMHO the vfs/ocfs2 synchronization is broken
(to some degree) but maybe that is intentional
> > okay, I moved those parts out of the ocfs2_ioctl
> > to match the general structure and enhance the
> > readabilitly - I hope I got the locks right this
> > time, please double check ...
> The locking looks good, thanks. The ioctl.[ch] parts of the patch
> looks pretty good now btw.
okay, good
> > okay, I added this and hopefully it will work as
> > expected, it didn't get exercised on my (up to now)
> > single node test setup :)
> Cool. As much testing as you can do is good :) At any rate, we'll
> certainly be testing things out over here once the patch is good for
> inclusion.
guess it will get more testing once I set it up
as a cluster system
> There is a small problem with the lvb stuff which I'll outline below.
>
> > I decided to make this patch a little more agressive
> > than the last one, using a reserved field of the
> > on disk structure to store the new attributes, which
> > actually has two advantages IMHO:
> >
> > - the mapping between ext2/3 and ocfs2 isn't required
> > anymore, as we an store the ext attrs without fear
> > of collisions with existing ocfs2 flags
> >
> > - the mixing and separation of ocfs2 flags and ext
> > flags can be avoided completely, probably enhancing
> > readability while simplifying code
> I agree that using a new field seems nicer.
> Can we compress things though and just use a u16?
sure we can, but I think we should either use
the flags _as is_ and do not map/change anything
or compress them into the existing i_flags,
avoiding the new on disk field completely
> I don't mind the translation that we were doing before, so you can go
> ahead and pack the actual bit values for OCFS2's purposes.
> Also I would prefer if you could split the i_reserved0 __le32 into
> a pair of __le16 values:
>
> -/*10*/ __le32 i_reserved0;
> +/*10*/ __le16 i_attr_flags;
> + __le16 i_reserved0;
>
> Actually, looking at the values, it seems that OCFS2_DIRSYNC_FL is the
> only one that won't fit within 16 bits, so you could just give that
> one some new value and we'll only have to translate on one flag :)
I'd prefer to use the flag by flag mapping instead
of some magic shift/mask which doesn't actually
gain that much
> > here is the new patch, this time inline for easier review and comments ...
> Great, thanks - more comments inline.
>
> > @@ -1368,6 +1372,16 @@ static void ocfs2_refresh_inode_from_lvb
> > inode->i_blocks =
> > ocfs2_align_bytes_to_sectors(i_size_read(inode));
> >
> > + switch (version) {
> > + case 3:
> > + oi->ip_iflags = be32_to_cpu(lvb->lvb_iflags);
> > + printk("lvb refresh for %p flags %x\n", inode, oi->ip_iflags);
> > + break;
> > + default:
> > + oi->ip_iflags = 0;
> > + printk("lvb refresh for %p flags = 0\n", inode);
> > + }
> You won't ever hit the 'default' case because
> ocfs2_meta_lvb_is_trustable() will always return false. Which is what
> we probably want in this case because if you're going to allow nodes
> which don't understand the attributes to be in the cluster we don't
> want to trust their lvbs (and instead read from disk) as they won't
> have put anything useful in the flags lvb field.
ah, okay, good to know, so we simply reduce that to
oi->ip_iflags = be32_to_cpu(lvb->lvb_iflags), yes?
> > inode->i_uid = be32_to_cpu(lvb->lvb_iuid);
> > inode->i_gid = be32_to_cpu(lvb->lvb_igid);
> > inode->i_mode = be16_to_cpu(lvb->lvb_imode);
> Hmm, you also left out the corresponding code in __ocfs2_stuff_meta_lvb()
hmm, I did? .. checking ... hmm, what about this part?
--- linux-2.6.16.20/fs/ocfs2/dlmglue.c 2006-01-18 06:08:34 +0100
+++ linux-2.6.16.20-ocfs2-0.01/fs/ocfs2/dlmglue.c 2006-06-12 22:32:00 +0200
@@ -1321,6 +1321,7 @@ static void __ocfs2_stuff_meta_lvb(struc
lvb->lvb_version = cpu_to_be32(OCFS2_LVB_VERSION);
lvb->lvb_isize = cpu_to_be64(i_size_read(inode));
lvb->lvb_iclusters = cpu_to_be32(oi->ip_clusters);
+ lvb->lvb_iflags = cpu_to_be32(oi->ip_iflags);
lvb->lvb_iuid = cpu_to_be32(inode->i_uid);
lvb->lvb_igid = cpu_to_be32(inode->i_gid);
lvb->lvb_imode = cpu_to_be16(inode->i_mode);
do we need more?
> I'll attach a quick, untested patch against dlmglue.[ch] which you can
> hopefully use.
okay, will check ...
> > diff -NurpP --minimal linux-2.6.16.20/fs/ocfs2/inode.h linux-2.6.16.20-ocfs2-0.01/fs/ocfs2/inode.h
> > --- linux-2.6.16.20/fs/ocfs2/inode.h 2006-04-09 13:49:54 +0200
> > +++ linux-2.6.16.20-ocfs2-0.01/fs/ocfs2/inode.h 2006-06-12 21:08:39 +0200
> > @@ -56,6 +56,7 @@ struct ocfs2_inode_info
> > struct ocfs2_journal_handle *ip_handle;
> >
> > u32 ip_flags; /* see below */
> > + u32 ip_iflags; /* inode attributes */
> Minor nit, but would you mind changing the name to something
> like ip_attr_flags? I know - I'm the one who brought up that
> not-so-good-name in the first place :) Maybe we can just start saying
> attr_flags instead of iflags elsewhere too? I apologize for the poor
> 1st pass at naming this thing :)
well, I actually used ip_iattr in the second patch
I did, but this actually makes things even more
confusing, as for example the functions would then
look like this:
+void ocfs2_set_inode_flags(struct inode *inode)
+{
+ unsigned int flags = OCFS2_I(inode)->ip_iattr;
+
+ inode->i_flags &= ...
and this:
+int ocfs2_get_iattr(struct inode *inode, unsigned *flags)
+int ocfs2_set_iattr(struct inode *inode, unsigned flags, unsigned mask)
which IMHO is even more confusing, but, I think we
won't have this name clash if we stick to mangling
the existing ip_flags with the new ones ...
> > -/*70*/ __le64 i_reserved1[9];
> > +/*70*/ __le32 i_iflags; /* inode attr flags */
> > + __le32 i_reserved1;
> > + __le64 i_reserved2[8];
> I guess just see my comment above on where to locate this.
>
>
> So IMHO, and you can probably tell this by the lack of patch comments
> that things are coming along pretty well. Thanks again for doing all
> this :)
you're welcome! and tx for the quick feedback!
will prepare a patch with the complete mangling
added back (in a slightly more readable way)
best,
Herbert
PS: did I miss the quickly attached patch? :)
> --Mark
>
> --
> Mark Fasheh
> Senior Software Developer, Oracle
> mark.fasheh at oracle.com
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Ocfs2-devel] ext2 attributes for OCFS2
2006-06-14 10:44 ` Herbert Poetzl
@ 2006-06-14 18:54 ` Mark Fasheh
2006-06-14 19:08 ` Daniel Phillips
0 siblings, 1 reply; 7+ messages in thread
From: Mark Fasheh @ 2006-06-14 18:54 UTC (permalink / raw)
To: ocfs2-devel
On Wed, Jun 14, 2006 at 12:44:00PM +0200, Herbert Poetzl wrote:
> > Yeah, it should mostly work just fine, but I'm betting there are some
> > paths which will need a manual re-check as things may have changed out
> > from under the VFS while we weren't holding a cluster lock. It should
> > be easy enough to check things like the immutable flag in those places
> > though.
>
> might be, I already wondered why things like
> mark_inode_dirty() do not work with ocfs2,
> IMHO the vfs/ocfs2 synchronization is broken
> (to some degree) but maybe that is intentional
Yeah, it's difficult to get the VFS to "cluster lock" without affecting the
runtime of other file systems, so we have to work within that constraint :/
> > I agree that using a new field seems nicer.
>
> > Can we compress things though and just use a u16?
>
> sure we can, but I think we should either use
> the flags _as is_ and do not map/change anything
> or compress them into the existing i_flags,
> avoiding the new on disk field completely
No, I'd rather not use the existing i_flags, though I never really explained
why.
Basically that field (i_flags) should be saved for bits "internal" to the
file system things that the user can't directly change or see. That's why I
thought using a new field was such a good idea. Now, the size of the field
is something I'd like to minimize - there's no sense using a full 32 bits
for something that will only ever use nine of them.
Please keep in mind that this has implications for things other than just
the disk inode structure. LVB space is _far_ more limited for example - only
64 bytes. Using 16 bits there instead of the full 32 will definitely show up
as we're adding features which require more fields there.
Now, you correctly point out that using anything less than a u32 will
require some level of mapping to ext2, which can be ugly. And I agree that
it's ugly, but it's less important than the other two issues thus far
outlined. Luckily, the ext2 mapping part just has to happen in ioctl.c, so
at least we're not polluting the rest of the code with it :)
> I'd prefer to use the flag by flag mapping instead
> of some magic shift/mask which doesn't actually
> gain that much
That's perfectly fine with me. I agree that we might as well mangle them as
cleanly as possible :)
> > You won't ever hit the 'default' case because
> > ocfs2_meta_lvb_is_trustable() will always return false. Which is what
> > we probably want in this case because if you're going to allow nodes
> > which don't understand the attributes to be in the cluster we don't
> > want to trust their lvbs (and instead read from disk) as they won't
> > have put anything useful in the flags lvb field.
>
> ah, okay, good to know, so we simply reduce that to
> oi->ip_iflags = be32_to_cpu(lvb->lvb_iflags), yes?
Right.
> > Hmm, you also left out the corresponding code in __ocfs2_stuff_meta_lvb()
>
> hmm, I did? .. checking ... hmm, what about this part?
Oops, sorry - for some reason I didn't see that on my last reading!
> well, I actually used ip_iattr in the second patch
> I did, but this actually makes things even more
> confusing, as for example the functions would then
> look like this:
>
> +void ocfs2_set_inode_flags(struct inode *inode)
> +{
> + unsigned int flags = OCFS2_I(inode)->ip_iattr;
> +
> + inode->i_flags &= ...
>
> and this:
>
> +int ocfs2_get_iattr(struct inode *inode, unsigned *flags)
> +int ocfs2_set_iattr(struct inode *inode, unsigned flags, unsigned mask)
How about ocfs2_get_attr_flags(...) and ocfs2_set_attr_flags(...)? Anyway,
this isn't a big deal or anything :)
> > So IMHO, and you can probably tell this by the lack of patch comments
> > that things are coming along pretty well. Thanks again for doing all
> > this :)
>
> you're welcome! and tx for the quick feedback!
No problem :)
> PS: did I miss the quickly attached patch? :)
Eh, did I forget that too? Heh, well I'll attach this time around so you get
an idea of what I was talking about, but you pretty much got it, save for
field sizes, and adding to a debug function.
Thanks,
--Mark
--
Mark Fasheh
Senior Software Developer, Oracle
mark.fasheh at oracle.com
diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
index d57860d..774a7a7 100644
--- a/fs/ocfs2/dlmglue.c
+++ b/fs/ocfs2/dlmglue.c
@@ -1330,6 +1330,7 @@ static void __ocfs2_stuff_meta_lvb(struc
cpu_to_be64(ocfs2_pack_timespec(&inode->i_ctime));
lvb->lvb_imtime_packed =
cpu_to_be64(ocfs2_pack_timespec(&inode->i_mtime));
+ lvb->lvb_iattr_flags = cpu_to_be16(oi->ip_attr_flags);
mlog_meta_lvb(0, lockres);
@@ -1377,6 +1378,7 @@ static void ocfs2_refresh_inode_from_lvb
be64_to_cpu(lvb->lvb_imtime_packed));
ocfs2_unpack_timespec(&inode->i_ctime,
be64_to_cpu(lvb->lvb_ictime_packed));
+ oi->ip_attr_flags = be16_to_cpu(lvb->lvb_iattr_flags);
spin_unlock(&oi->ip_lock);
mlog_exit_void();
@@ -2908,8 +2910,9 @@ void ocfs2_dump_meta_lvb_info(u64 level,
be32_to_cpu(lvb->lvb_iuid), be32_to_cpu(lvb->lvb_igid),
be16_to_cpu(lvb->lvb_imode));
mlog(level, "nlink %u, atime_packed 0x%llx, ctime_packed 0x%llx, "
- "mtime_packed 0x%llx\n", be16_to_cpu(lvb->lvb_inlink),
+ "mtime_packed 0x%llx, attr 0x%x\n", be16_to_cpu(lvb->lvb_inlink),
(long long)be64_to_cpu(lvb->lvb_iatime_packed),
(long long)be64_to_cpu(lvb->lvb_ictime_packed),
- (long long)be64_to_cpu(lvb->lvb_imtime_packed));
+ (long long)be64_to_cpu(lvb->lvb_imtime_packed),
+ be16_to_cpu(lvb->lvb_iattr_flags));
}
diff --git a/fs/ocfs2/dlmglue.h b/fs/ocfs2/dlmglue.h
index 8f2d1db..cfb06cb 100644
--- a/fs/ocfs2/dlmglue.h
+++ b/fs/ocfs2/dlmglue.h
@@ -27,7 +27,7 @@
#ifndef DLMGLUE_H
#define DLMGLUE_H
-#define OCFS2_LVB_VERSION 2
+#define OCFS2_LVB_VERSION 3
struct ocfs2_meta_lvb {
__be32 lvb_version;
@@ -40,7 +40,9 @@ struct ocfs2_meta_lvb {
__be64 lvb_isize;
__be16 lvb_imode;
__be16 lvb_inlink;
- __be32 lvb_reserved[3];
+ __be16 lvb_iattr_flags;
+ __be16 lvb_reserved0;
+ __be32 lvb_reserved1[2];
};
/* ocfs2_meta_lock_full() and ocfs2_data_lock_full() 'arg_flags' flags */
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [Ocfs2-devel] ext2 attributes for OCFS2
2006-06-14 18:54 ` Mark Fasheh
@ 2006-06-14 19:08 ` Daniel Phillips
0 siblings, 0 replies; 7+ messages in thread
From: Daniel Phillips @ 2006-06-14 19:08 UTC (permalink / raw)
To: ocfs2-devel
Mark Fasheh wrote:
> On Wed, Jun 14, 2006 at 12:44:00PM +0200, Herbert Poetzl wrote:
>>>Yeah, it should mostly work just fine, but I'm betting there are some
>>>paths which will need a manual re-check as things may have changed out
>>>from under the VFS while we weren't holding a cluster lock. It should
>>>be easy enough to check things like the immutable flag in those places
>>>though.
>>
>>might be, I already wondered why things like
>>mark_inode_dirty() do not work with ocfs2,
>>IMHO the vfs/ocfs2 synchronization is broken
>>(to some degree) but maybe that is intentional
>
> Yeah, it's difficult to get the VFS to "cluster lock" without affecting the
> runtime of other file systems, so we have to work within that constraint :/
Please don't be shy about asking. Can you describe the mark_inode_dirty
issue?
Regards,
Daniel
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2006-06-14 19:08 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-06-10 12:32 [Ocfs2-devel] ext2 attributes for OCFS2 Herbert Poetzl
2006-06-11 17:45 ` Mark Fasheh
2006-06-12 21:41 ` Herbert Poetzl
2006-06-13 23:56 ` Mark Fasheh
2006-06-14 10:44 ` Herbert Poetzl
2006-06-14 18:54 ` Mark Fasheh
2006-06-14 19:08 ` Daniel Phillips
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.