[PATCH 0/6] RFC: introduce extended inode owner identifier v4

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/6] RFC: introduce extended inode owner identifier v4
@ 2010-02-18 16:45 Dmitry Monakhov
  2010-02-18 16:45 ` [PATCH 1/6] vfs: add per-sb auxiliary inode attribute table Dmitry Monakhov
  2010-02-18 23:31 ` [PATCH 0/6] RFC: introduce extended inode owner identifier v4 Dave Chinner
  0 siblings, 2 replies; 13+ messages in thread
From: Dmitry Monakhov @ 2010-02-18 16:45 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Dmitry Monakhov

This is new generation of attempt to add extended inode identifier.
In previous posts it was called tree_id, subtree_id, project_id.
But after none of this was not good enough. I've refused project_id
because it is well know XFS feature. And my implementation is
slightly different from it especially from user-space point of view.
In order to avoid ambiguity i've stopped at the "metagroup" term.
I hope it is final name for the feature.

*Feature description*
1) Inode may has a metagroup identifier which has same meaning as uid/gid.
2) Id is stored in inode's xattr named "system.metagroup"
3) Id is inherent from parent inode on creation.
4) This id is cached in memory inode structure(inside fsprivate_inode)
   and is accessible from vfs layer.
5) Since id is cached in memory it may be used for different purposes
   such as:
5A) Implement additional quota id space ortohonal to uid/gid. This is
    useful in managing quota for some filesystem hierarchy(chroot or
    container over bindmount)
5B) Export dedicated fs hierarchy to nfsd (only inode which has some
    metagroup will be accessible via nfsd)

*Implementation details*

It is unlikely that everybody will be happy to have new field in
vfs_inode(which is not widely used). What's why this field is
stored inside private_inode.

But we have to have an access to this private_field.
First time similar issue was resolved while implementing
generic quota reserved_space management interface.
Jan suggested to implement some sort auxiliary inode attributes map.
And access non standard inode attributes via this aux_attr_map.
I've implemented this idea in form of per-sb aux_attribute table.
(Macros is not good here because different attributes may have
different types which result in massive typecasting).
If someone have better ideas please say you word.

In order to give an overview of this interface i've converted
quota's reserved space interface to that new aux_attr_table.

After we have generic interface for auxiliary attributes
each filesystem may implement metagroup support in it's own meaner.

This should be done in following steps:
1) Add field to private_inode, and export it via aux_attribute
2) Implement id inheritance on inode creation
3) Implement handler for "system.metagroup" xattr.

This patch contains an example implementation of this for ext4.

The patch-set is compile tested only.

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
---
 Makefile |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/Makefile b/Makefile
index 12b1aa1..c9aef25 100644
--- a/Makefile
+++ b/Makefile
@@ -1,7 +1,7 @@
 VERSION = 2
 PATCHLEVEL = 6
 SUBLEVEL = 33
-EXTRAVERSION = -rc8
+EXTRAVERSION = -rc8-metagroup
 NAME = Man-Eating Seals of Antiquity

 # *DOCUMENTATION*
-- 
1.6.6

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 1/6] vfs: add per-sb auxiliary inode attribute table
  2010-02-18 16:45 [PATCH 0/6] RFC: introduce extended inode owner identifier v4 Dmitry Monakhov
@ 2010-02-18 16:45 ` Dmitry Monakhov
  2010-02-18 16:45   ` [PATCH 2/6] quota: switch reservation space management to aux_attribute Dmitry Monakhov
  2010-02-18 19:00   ` [PATCH 1/6] vfs: add per-sb auxiliary inode attribute table Brad Boyer
  2010-02-18 23:31 ` [PATCH 0/6] RFC: introduce extended inode owner identifier v4 Dave Chinner
  1 sibling, 2 replies; 13+ messages in thread
From: Dmitry Monakhov @ 2010-02-18 16:45 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Dmitry Monakhov

Some times it is useful to export non standard attributes
to generic vfs layer, but it is too expansive to store it
inside vfs inode. Let's introduce generic interface for this
purpose. One may declare an attribute and filesystem provides
access to it, if necessery.

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
---
 include/linux/fs.h |    7 ++++++-
 1 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index b1bcb27..c510ef7 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -384,6 +384,7 @@ struct inodes_stat_t {
 #include <asm/byteorder.h>
 
 struct export_operations;
+struct aux_attributes;
 struct hd_geometry;
 struct iovec;
 struct nameidata;
@@ -1323,6 +1324,7 @@ struct super_block {
 	const struct dquot_operations	*dq_op;
 	const struct quotactl_ops	*s_qcop;
 	const struct export_operations *s_export_op;
+	const struct aux_attributes *s_aux_attr;
 	unsigned long		s_flags;
 	unsigned long		s_magic;
 	struct dentry		*s_root;
@@ -1576,7 +1578,10 @@ struct super_operations {
 #endif
 	int (*bdev_try_to_free_page)(struct super_block*, struct page*, gfp_t);
 };
-
+struct aux_attributes
+{
+	int supported;
+};
 /*
  * Inode state bits.  Protected by inode_lock.
  *
-- 
1.6.6


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/6] quota: switch reservation space management to aux_attribute
  2010-02-18 16:45 ` [PATCH 1/6] vfs: add per-sb auxiliary inode attribute table Dmitry Monakhov
@ 2010-02-18 16:45   ` Dmitry Monakhov
  2010-02-18 16:45     ` [PATCH 3/6] vfs: Add additional owner identifier Dmitry Monakhov
  2010-02-18 19:00   ` [PATCH 1/6] vfs: add per-sb auxiliary inode attribute table Brad Boyer
  1 sibling, 1 reply; 13+ messages in thread
From: Dmitry Monakhov @ 2010-02-18 16:45 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Dmitry Monakhov


Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
---
 fs/ext4/super.c       |   11 +++++++----
 fs/quota/dquot.c      |    7 ++++---
 include/linux/fs.h    |    5 +++++
 include/linux/quota.h |    3 ---
 4 files changed, 16 insertions(+), 10 deletions(-)

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 735c20d..84a51d9 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1018,9 +1018,6 @@ static const struct dquot_operations ext4_quota_operations = {
 	.reserve_space	= dquot_reserve_space,
 	.claim_space	= dquot_claim_space,
 	.release_rsv	= dquot_release_reserved_space,
-#ifdef CONFIG_QUOTA
-	.get_reserved_space = ext4_get_reserved_space,
-#endif
 	.alloc_inode	= dquot_alloc_inode,
 	.free_space	= dquot_free_space,
 	.free_inode	= dquot_free_inode,
@@ -1033,7 +1030,13 @@ static const struct dquot_operations ext4_quota_operations = {
 	.alloc_dquot	= dquot_alloc,
 	.destroy_dquot	= dquot_destroy,
 };
-
+static const struct aux_attributes ext4_aux_attr =
+{
+	.supported = 1,
+#ifdef CONFIG_QUOTA
+	.reserved_space = ext4_get_reserved_space,
+#endif
+};
 static const struct quotactl_ops ext4_qctl_operations = {
 	.quota_on	= ext4_quota_on,
 	.quota_off	= vfs_quota_off,
diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c
index 4d2041f..de4b8fc 100644
--- a/fs/quota/dquot.c
+++ b/fs/quota/dquot.c
@@ -1405,8 +1405,8 @@ static qsize_t *inode_reserved_space(struct inode * inode)
 {
 	/* Filesystem must explicitly define it's own method in order to use
 	 * quota reservation interface */
-	BUG_ON(!inode->i_sb->dq_op->get_reserved_space);
-	return inode->i_sb->dq_op->get_reserved_space(inode);
+	BUG_ON(!inode->i_sb->s_aux_attr->reserved_space);
+	return inode->i_sb->s_aux_attr->reserved_space(inode);
 }
 
 void inode_add_rsv_space(struct inode *inode, qsize_t number)
@@ -1438,7 +1438,8 @@ static qsize_t inode_get_rsv_space(struct inode *inode)
 {
 	qsize_t ret;
 
-	if (!inode->i_sb->dq_op->get_reserved_space)
+	if (!inode->i_sb->s_aux_attr ||
+		!inode->i_sb->s_aux_attr->reserved_space)
 		return 0;
 	spin_lock(&inode->i_lock);
 	ret = *inode_reserved_space(inode);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index c510ef7..0cd0105 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1581,6 +1581,11 @@ struct super_operations {
 struct aux_attributes
 {
 	int supported;
+#ifdef CONFIG_QUOTA
+	/* Delay allocation space reservation  managed internally by quota,
+	 * and protected by i_lock similar to i_blocks+i_bytes. */
+	qsize_t* (*reserved_space)(struct inode *inode);
+#endif
 };
 /*
  * Inode state bits.  Protected by inode_lock.
diff --git a/include/linux/quota.h b/include/linux/quota.h
index edf34f2..680605d 100644
--- a/include/linux/quota.h
+++ b/include/linux/quota.h
@@ -315,9 +315,6 @@ struct dquot_operations {
 	int (*claim_space) (struct inode *, qsize_t);
 	/* release rsved quota for delayed alloc */
 	void (*release_rsv) (struct inode *, qsize_t);
-	/* get reserved quota for delayed alloc, value returned is managed by
-	 * quota code only */
-	qsize_t *(*get_reserved_space) (struct inode *);
 };
 
 /* Operations handling requests from userspace */
-- 
1.6.6


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 3/6] vfs: Add additional owner identifier
  2010-02-18 16:45   ` [PATCH 2/6] quota: switch reservation space management to aux_attribute Dmitry Monakhov
@ 2010-02-18 16:45     ` Dmitry Monakhov
  2010-02-18 16:45       ` [PATCH 4/6] quota: Implement metagroup support for quota Dmitry Monakhov
  0 siblings, 1 reply; 13+ messages in thread
From: Dmitry Monakhov @ 2010-02-18 16:45 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Dmitry Monakhov


Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
---
 fs/Kconfig            |    7 +++++++
 include/linux/fs.h    |    5 +++++
 include/linux/xattr.h |    3 +++
 3 files changed, 15 insertions(+), 0 deletions(-)

diff --git a/fs/Kconfig b/fs/Kconfig
index 64d44ef..ad47589 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -54,6 +54,13 @@ config FILE_LOCKING
 	  This option enables standard file locking support, required
           for filesystems like NFS and for the flock() system
           call. Disabling this option saves about 11k.
+config METAGROUP
+	bool "Enable metagroup inode identifier"
+	default y
+	help
+	  This option enables metagroup inode identifier. Metagroup
+	  may be used as auxiliary owner specifier in addition to
+	  standard uid/gid.
 
 source "fs/notify/Kconfig"
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 0cd0105..f1139ed 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1586,6 +1586,11 @@ struct aux_attributes
 	 * and protected by i_lock similar to i_blocks+i_bytes. */
 	qsize_t* (*reserved_space)(struct inode *inode);
 #endif
+#ifdef CONFIG_METAGROUP
+	/* Metagroup id, protected by i_mutex similar to i_uid/i_gid*/
+	uid_t* (*metagroup)(struct inode *inode);
+#endif
+
 };
 /*
  * Inode state bits.  Protected by inode_lock.
diff --git a/include/linux/xattr.h b/include/linux/xattr.h
index fb9b7e6..efd9ed1 100644
--- a/include/linux/xattr.h
+++ b/include/linux/xattr.h
@@ -33,6 +33,9 @@
 #define XATTR_USER_PREFIX "user."
 #define XATTR_USER_PREFIX_LEN (sizeof (XATTR_USER_PREFIX) - 1)
 
+#define XATTR_METAGROUP "system.metagroup"
+#define XATTR_METAGROUP_LEN (sizeof (XATTR_METAGROUP))
+
 struct inode;
 struct dentry;
 
-- 
1.6.6


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 4/6] quota: Implement metagroup support for quota
  2010-02-18 16:45     ` [PATCH 3/6] vfs: Add additional owner identifier Dmitry Monakhov
@ 2010-02-18 16:45       ` Dmitry Monakhov
  2010-02-18 16:45         ` [PATCH 5/6] ext4: enlarge mount option field Dmitry Monakhov
  0 siblings, 1 reply; 13+ messages in thread
From: Dmitry Monakhov @ 2010-02-18 16:45 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Dmitry Monakhov


Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
---
 fs/quota/dquot.c      |   12 ++++++++++++
 fs/quota/quotaio_v2.h |    6 ++++--
 include/linux/quota.h |   12 +++++++++++-
 3 files changed, 27 insertions(+), 3 deletions(-)

diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c
index de4b8fc..40075ea 100644
--- a/fs/quota/dquot.c
+++ b/fs/quota/dquot.c
@@ -1090,6 +1090,11 @@ static int need_print_warning(struct dquot *dquot)
 			return current_fsuid() == dquot->dq_id;
 		case GRPQUOTA:
 			return in_group_p(dquot->dq_id);
+		case MGRQUOTA:
+			/* XXX: Currently there is no way to understand
+			   which metagroup this task belonges to, So print
+			   a warn message unconditionally. -dmon */
+			return 1;
 	}
 	return 0;
 }
@@ -1322,6 +1327,13 @@ int dquot_initialize(struct inode *inode, int type)
 		case GRPQUOTA:
 			id = inode->i_gid;
 			break;
+		case MGRQUOTA:
+			if (inode->i_sb->s_aux_attr &&
+				inode->i_sb->s_aux_attr->metagroup)
+				id = *inode->i_sb->s_aux_attr->metagroup(inode);
+			else
+				BUG_ON(sb_has_quota_loaded(inode->i_sb, MGRQUOTA));
+			break;
 		}
 		got[cnt] = dqget(sb, id, cnt);
 	}
diff --git a/fs/quota/quotaio_v2.h b/fs/quota/quotaio_v2.h
index f1966b4..c65c7fc 100644
--- a/fs/quota/quotaio_v2.h
+++ b/fs/quota/quotaio_v2.h
@@ -13,12 +13,14 @@
  */
 #define V2_INITQMAGICS {\
 	0xd9c01f11,	/* USRQUOTA */\
-	0xd9c01927	/* GRPQUOTA */\
+	0xd9c01927,	/* GRPQUOTA */\
+	0xd9c03f14	/* MRGQUOTA */\
 }
 
 #define V2_INITQVERSIONS {\
 	1,		/* USRQUOTA */\
-	1		/* GRPQUOTA */\
+	1,		/* GRPQUOTA */	\
+	1		/* MGRQUOTA */\
 }
 
 /* First generic header */
diff --git a/include/linux/quota.h b/include/linux/quota.h
index 680605d..a8f6cbe 100644
--- a/include/linux/quota.h
+++ b/include/linux/quota.h
@@ -36,18 +36,28 @@
 #include <linux/errno.h>
 #include <linux/types.h>
 
-#define __DQUOT_VERSION__	"dquot_6.5.2"
+#define __DQUOT_VERSION__	"dquot_6.6.0"
 
+#ifdef CONFIG_METAGROUP
+#define MAXQUOTAS 3
+#else
 #define MAXQUOTAS 2
+#endif
+
 #define USRQUOTA  0		/* element used for user quotas */
 #define GRPQUOTA  1		/* element used for group quotas */
 
+#ifdef CONFIG_METAGROUP
+#define MGRQUOTA  2		/* element used for metagroup quotas */
+#endif
+
 /*
  * Definitions for the default names of the quotas files.
  */
 #define INITQFNAMES { \
 	"user",    /* USRQUOTA */ \
 	"group",   /* GRPQUOTA */ \
+	"metagroup",    /* MGRQUOTA */	\
 	"undefined", \
 };
 
-- 
1.6.6


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 5/6] ext4: enlarge mount option field
  2010-02-18 16:45       ` [PATCH 4/6] quota: Implement metagroup support for quota Dmitry Monakhov
@ 2010-02-18 16:45         ` Dmitry Monakhov
  2010-02-18 16:45           ` [PATCH 6/6] ext4: Implement metagroup support for ext4 filesystem Dmitry Monakhov
  0 siblings, 1 reply; 13+ messages in thread
From: Dmitry Monakhov @ 2010-02-18 16:45 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Dmitry Monakhov

Currently only one bit left in s_mount_opt. Let's
double size it for future purposes.

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
---
 fs/ext4/ext4.h      |   62 +++++++++++++++++++++++++-------------------------
 fs/ext4/ext4_jbd2.c |    2 +-
 fs/ext4/super.c     |    2 +-
 3 files changed, 33 insertions(+), 33 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 874d169..b2c01a2 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -421,7 +421,7 @@ struct ext4_new_group_data {
  *  Mount options
  */
 struct ext4_mount_options {
-	unsigned long s_mount_opt;
+	unsigned long long s_mount_opt;
 	uid_t s_resuid;
 	gid_t s_resgid;
 	unsigned long s_commit_interval;
@@ -738,35 +738,35 @@ struct ext4_inode_info {
 /*
  * Mount flags
  */
-#define EXT4_MOUNT_OLDALLOC		0x00002  /* Don't use the new Orlov allocator */
-#define EXT4_MOUNT_GRPID		0x00004	/* Create files with directory's group */
-#define EXT4_MOUNT_DEBUG		0x00008	/* Some debugging messages */
-#define EXT4_MOUNT_ERRORS_CONT		0x00010	/* Continue on errors */
-#define EXT4_MOUNT_ERRORS_RO		0x00020	/* Remount fs ro on errors */
-#define EXT4_MOUNT_ERRORS_PANIC		0x00040	/* Panic on errors */
-#define EXT4_MOUNT_MINIX_DF		0x00080	/* Mimics the Minix statfs */
-#define EXT4_MOUNT_NOLOAD		0x00100	/* Don't use existing journal*/
-#define EXT4_MOUNT_DATA_FLAGS		0x00C00	/* Mode for data writes: */
-#define EXT4_MOUNT_JOURNAL_DATA		0x00400	/* Write data to journal */
-#define EXT4_MOUNT_ORDERED_DATA		0x00800	/* Flush data before commit */
-#define EXT4_MOUNT_WRITEBACK_DATA	0x00C00	/* No data ordering */
-#define EXT4_MOUNT_UPDATE_JOURNAL	0x01000	/* Update the journal format */
-#define EXT4_MOUNT_NO_UID32		0x02000  /* Disable 32-bit UIDs */
-#define EXT4_MOUNT_XATTR_USER		0x04000	/* Extended user attributes */
-#define EXT4_MOUNT_POSIX_ACL		0x08000	/* POSIX Access Control Lists */
-#define EXT4_MOUNT_NO_AUTO_DA_ALLOC	0x10000	/* No auto delalloc mapping */
-#define EXT4_MOUNT_BARRIER		0x20000 /* Use block barriers */
-#define EXT4_MOUNT_NOBH			0x40000 /* No bufferheads */
-#define EXT4_MOUNT_QUOTA		0x80000 /* Some quota option set */
-#define EXT4_MOUNT_USRQUOTA		0x100000 /* "old" user quota */
-#define EXT4_MOUNT_GRPQUOTA		0x200000 /* "old" group quota */
-#define EXT4_MOUNT_JOURNAL_CHECKSUM	0x800000 /* Journal checksums */
-#define EXT4_MOUNT_JOURNAL_ASYNC_COMMIT	0x1000000 /* Journal Async Commit */
-#define EXT4_MOUNT_I_VERSION            0x2000000 /* i_version support */
-#define EXT4_MOUNT_DELALLOC		0x8000000 /* Delalloc support */
-#define EXT4_MOUNT_DATA_ERR_ABORT	0x10000000 /* Abort on file data write */
-#define EXT4_MOUNT_BLOCK_VALIDITY	0x20000000 /* Block validity checking */
-#define EXT4_MOUNT_DISCARD		0x40000000 /* Issue DISCARD requests */
+#define EXT4_MOUNT_OLDALLOC		0x00002LL  /* Don't use the new Orlov allocator */
+#define EXT4_MOUNT_GRPID		0x00004LL /* Create files with directory's group */
+#define EXT4_MOUNT_DEBUG		0x00008LL /* Some debugging messages */
+#define EXT4_MOUNT_ERRORS_CONT		0x00010LL /* Continue on errors */
+#define EXT4_MOUNT_ERRORS_RO		0x00020LL /* Remount fs ro on errors */
+#define EXT4_MOUNT_ERRORS_PANIC		0x00040LL /* Panic on errors */
+#define EXT4_MOUNT_MINIX_DF		0x00080LL /* Mimics the Minix statfs */
+#define EXT4_MOUNT_NOLOAD		0x00100LL /* Don't use existing journal*/
+#define EXT4_MOUNT_DATA_FLAGS		0x00C00LL /* Mode for data writes: */
+#define EXT4_MOUNT_JOURNAL_DATA		0x00400LL /* Write data to journal */
+#define EXT4_MOUNT_ORDERED_DATA		0x00800LL /* Flush data before commit */
+#define EXT4_MOUNT_WRITEBACK_DATA	0x00C00LL /* No data ordering */
+#define EXT4_MOUNT_UPDATE_JOURNAL	0x01000LL /* Update the journal format */
+#define EXT4_MOUNT_NO_UID32		0x02000LL /* Disable 32-bit UIDs */
+#define EXT4_MOUNT_XATTR_USER		0x04000LL /* Extended user attributes */
+#define EXT4_MOUNT_POSIX_ACL		0x08000LL /* POSIX Access Control Lists */
+#define EXT4_MOUNT_NO_AUTO_DA_ALLOC	0x10000LL /* No auto delalloc mapping */
+#define EXT4_MOUNT_BARRIER		0x20000LL /* Use block barriers */
+#define EXT4_MOUNT_NOBH			0x40000LL /* No bufferheads */
+#define EXT4_MOUNT_QUOTA		0x80000LL /* Some quota option set */
+#define EXT4_MOUNT_USRQUOTA		0x100000LL /* "old" user quota */
+#define EXT4_MOUNT_GRPQUOTA		0x200000LL /* "old" group quota */
+#define EXT4_MOUNT_JOURNAL_CHECKSUM	0x800000LL /* Journal checksums */
+#define EXT4_MOUNT_JOURNAL_ASYNC_COMMIT	0x1000000LL /* Journal Async Commit */
+#define EXT4_MOUNT_I_VERSION            0x2000000LL /* i_version support */
+#define EXT4_MOUNT_DELALLOC		0x8000000LL /* Delalloc support */
+#define EXT4_MOUNT_DATA_ERR_ABORT	0x10000000LL /* Abort on file data write */
+#define EXT4_MOUNT_BLOCK_VALIDITY	0x20000000LL /* Block validity checking */
+#define EXT4_MOUNT_DISCARD		0x40000000LL /* Issue DISCARD requests */
 
 #define clear_opt(o, opt)		o &= ~EXT4_MOUNT_##opt
 #define set_opt(o, opt)			o |= EXT4_MOUNT_##opt
@@ -915,7 +915,7 @@ struct ext4_sb_info {
 	struct buffer_head * s_sbh;	/* Buffer containing the super block */
 	struct ext4_super_block *s_es;	/* Pointer to the super block in the buffer */
 	struct buffer_head **s_group_desc;
-	unsigned int s_mount_opt;
+	unsigned long long s_mount_opt;
 	unsigned int s_mount_flags;
 	ext4_fsblk_t s_sb_block;
 	uid_t s_resuid;
diff --git a/fs/ext4/ext4_jbd2.c b/fs/ext4/ext4_jbd2.c
index b57e5c7..36e1f98 100644
--- a/fs/ext4/ext4_jbd2.c
+++ b/fs/ext4/ext4_jbd2.c
@@ -58,7 +58,7 @@ int __ext4_forget(const char *where, handle_t *handle, int is_metadata,
 	BUFFER_TRACE(bh, "enter");
 
 	jbd_debug(4, "forgetting bh %p: is_metadata = %d, mode %o, "
-		  "data mode %x\n",
+		  "data mode %llx\n",
 		  bh, is_metadata, inode->i_mode,
 		  test_opt(inode->i_sb, DATA_FLAGS));
 
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 84a51d9..80d6c14 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1700,7 +1700,7 @@ static int ext4_setup_super(struct super_block *sb, struct ext4_super_block *es,
 	ext4_commit_super(sb, 1);
 	if (test_opt(sb, DEBUG))
 		printk(KERN_INFO "[EXT4 FS bs=%lu, gc=%u, "
-				"bpg=%lu, ipg=%lu, mo=%04x]\n",
+				"bpg=%lu, ipg=%lu, mo=%08llx]\n",
 			sb->s_blocksize,
 			sbi->s_groups_count,
 			EXT4_BLOCKS_PER_GROUP(sb),
-- 
1.6.6


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 6/6] ext4: Implement metagroup support for ext4 filesystem
  2010-02-18 16:45         ` [PATCH 5/6] ext4: enlarge mount option field Dmitry Monakhov
@ 2010-02-18 16:45           ` Dmitry Monakhov
  0 siblings, 0 replies; 13+ messages in thread
From: Dmitry Monakhov @ 2010-02-18 16:45 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Dmitry Monakhov


Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
---
 fs/ext4/Kconfig           |    8 +++
 fs/ext4/Makefile          |    1 +
 fs/ext4/ext4.h            |    8 ++-
 fs/ext4/ialloc.c          |    5 +-
 fs/ext4/inode.c           |   13 ++++-
 fs/ext4/super.c           |    9 +++-
 fs/ext4/xattr.c           |    7 ++
 fs/ext4/xattr.h           |   11 +++
 fs/ext4/xattr_metagroup.c |  153 +++++++++++++++++++++++++++++++++++++++++++++
 9 files changed, 211 insertions(+), 4 deletions(-)
 create mode 100644 fs/ext4/xattr_metagroup.c

diff --git a/fs/ext4/Kconfig b/fs/ext4/Kconfig
index 9ed1bb1..e3365db 100644
--- a/fs/ext4/Kconfig
+++ b/fs/ext4/Kconfig
@@ -74,6 +74,14 @@ config EXT4_FS_SECURITY
 
 	  If you are not using a security module that requires using
 	  extended attributes for file security labels, say N.
+config EXT4_METAGROUP
+	bool "Ext4 metagroup support"
+	depends on METAGROUP
+	depends on EXT4_FS_XATTR
+	help
+	  Enables metagroup inode identifier support for ext4 filesystem.
+	  This feature allow to assign some id to inodes similar to
+	  uid/gid. 
 
 config EXT4_DEBUG
 	bool "EXT4 debugging support"
diff --git a/fs/ext4/Makefile b/fs/ext4/Makefile
index 8867b2a..62f75b8 100644
--- a/fs/ext4/Makefile
+++ b/fs/ext4/Makefile
@@ -11,3 +11,4 @@ ext4-y	:= balloc.o bitmap.o dir.o file.o fsync.o ialloc.o inode.o \
 ext4-$(CONFIG_EXT4_FS_XATTR)		+= xattr.o xattr_user.o xattr_trusted.o
 ext4-$(CONFIG_EXT4_FS_POSIX_ACL)	+= acl.o
 ext4-$(CONFIG_EXT4_FS_SECURITY)		+= xattr_security.o
+ext4-$(CONFIG_EXT4_METAGROUP)		+= xattr_metagroup.o
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index b2c01a2..c3f95e7 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -719,6 +719,10 @@ struct ext4_inode_info {
 	 */
 	tid_t i_sync_tid;
 	tid_t i_datasync_tid;
+#ifdef CONFIG_EXT4_METAGROUP
+	/* metagroup id, additional owner identifier similar to uid/gid */
+	unsigned int i_mid;
+#endif
 };
 
 /*
@@ -766,7 +770,9 @@ struct ext4_inode_info {
 #define EXT4_MOUNT_DELALLOC		0x8000000LL /* Delalloc support */
 #define EXT4_MOUNT_DATA_ERR_ABORT	0x10000000LL /* Abort on file data write */
 #define EXT4_MOUNT_BLOCK_VALIDITY	0x20000000LL /* Block validity checking */
-#define EXT4_MOUNT_DISCARD		0x40000000LL /* Issue DISCARD requests */
+#define EXT4_MOUNT_DISCARD		0x40000000LL /* Issue DISCARD requests 
+*/
+#define EXT4_MOUNT_METAGROUP		0x80000000LL /* extended owner id */
 
 #define clear_opt(o, opt)		o &= ~EXT4_MOUNT_##opt
 #define set_opt(o, opt)			o |= EXT4_MOUNT_##opt
diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
index f3624ea..535b905 100644
--- a/fs/ext4/ialloc.c
+++ b/fs/ext4/ialloc.c
@@ -1032,7 +1032,10 @@ got:
 	ei->i_state = EXT4_STATE_NEW;
 
 	ei->i_extra_isize = EXT4_SB(sb)->s_want_extra_isize;
-
+#ifdef CONFIG_EXT4_METAGROUP
+	// XXX: move this to generic inode init helper
+	EXT4_I(inode)->i_mid = EXT4_I(dir)->i_mid;
+#endif
 	ret = inode;
 	if (vfs_dq_alloc_inode(inode)) {
 		err = -EDQUOT;
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index e119524..b1b5fdc 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -4936,7 +4936,18 @@ struct inode *ext4_iget(struct super_block *sb, unsigned long ino)
 	}
 	if (ret)
 		goto bad_inode;
-
+#ifdef CONFIG_EXT4_METAGROUP
+	if(test_opt(inode->i_sb, METAGROUP)) {
+		ret = ext4_metagroup_read(inode, &ei->i_mid);
+		if (ret == -ENODATA) {
+			ei->i_mid = 0;
+			ret = 0;
+		}
+		if (ret)
+			goto bad_inode;
+	} else
+		ei->i_mid = 0;
+#endif
 	if (S_ISREG(inode->i_mode)) {
 		inode->i_op = &ext4_file_inode_operations;
 		inode->i_fop = &ext4_file_operations;
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 80d6c14..cb169f8 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -923,6 +923,9 @@ static int ext4_show_options(struct seq_file *seq, struct vfsmount *vfs)
 	if (test_opt(sb, DISCARD))
 		seq_puts(seq, ",discard");
 
+	if (test_opt(sb, METAGROUP))
+		seq_puts(seq, ",metagroup");
+
 	if (test_opt(sb, NOLOAD))
 		seq_puts(seq, ",norecovery");
 
@@ -1112,7 +1115,7 @@ enum {
 	Opt_stripe, Opt_delalloc, Opt_nodelalloc,
 	Opt_block_validity, Opt_noblock_validity,
 	Opt_inode_readahead_blks, Opt_journal_ioprio,
-	Opt_discard, Opt_nodiscard,
+	Opt_discard, Opt_nodiscard, Opt_metagroup,
 };
 
 static const match_table_t tokens = {
@@ -1181,6 +1184,7 @@ static const match_table_t tokens = {
 	{Opt_noauto_da_alloc, "noauto_da_alloc"},
 	{Opt_discard, "discard"},
 	{Opt_nodiscard, "nodiscard"},
+	{Opt_metagroup, "metagroup"},
 	{Opt_err, NULL},
 };
 
@@ -1612,6 +1616,9 @@ set_qf_format:
 		case Opt_nodiscard:
 			clear_opt(sbi->s_mount_opt, DISCARD);
 			break;
+		case Opt_metagroup:
+			set_opt(sbi->s_mount_opt, METAGROUP);
+			break;
 		default:
 			ext4_msg(sb, KERN_ERR,
 			       "Unrecognized mount option \"%s\" "
diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c
index f3a2f7e..a97294b 100644
--- a/fs/ext4/xattr.c
+++ b/fs/ext4/xattr.c
@@ -107,6 +107,10 @@ static struct xattr_handler *ext4_xattr_handler_map[] = {
 #ifdef CONFIG_EXT4_FS_SECURITY
 	[EXT4_XATTR_INDEX_SECURITY]	     = &ext4_xattr_security_handler,
 #endif
+#ifdef CONFIG_EXT4_METAGROUP
+	[EXT4_XATTR_INDEX_METAGROUP]	     = &ext4_xattr_metagroup_handler,
+#endif
+
 };
 
 struct xattr_handler *ext4_xattr_handlers[] = {
@@ -119,6 +123,9 @@ struct xattr_handler *ext4_xattr_handlers[] = {
 #ifdef CONFIG_EXT4_FS_SECURITY
 	&ext4_xattr_security_handler,
 #endif
+#ifdef CONFIG_EXT4_METAGROUP
+	&ext4_xattr_metagroup_handler,
+#endif
 	NULL
 };
 
diff --git a/fs/ext4/xattr.h b/fs/ext4/xattr.h
index 8ede88b..46b8369 100644
--- a/fs/ext4/xattr.h
+++ b/fs/ext4/xattr.h
@@ -21,6 +21,7 @@
 #define EXT4_XATTR_INDEX_TRUSTED		4
 #define	EXT4_XATTR_INDEX_LUSTRE			5
 #define EXT4_XATTR_INDEX_SECURITY	        6
+#define EXT4_XATTR_INDEX_METAGROUP	        7
 
 struct ext4_xattr_header {
 	__le32	h_magic;	/* magic number for identification */
@@ -70,6 +71,7 @@ extern struct xattr_handler ext4_xattr_trusted_handler;
 extern struct xattr_handler ext4_xattr_acl_access_handler;
 extern struct xattr_handler ext4_xattr_acl_default_handler;
 extern struct xattr_handler ext4_xattr_security_handler;
+extern struct xattr_handler ext4_xattr_metagroup_handler;
 
 extern ssize_t ext4_listxattr(struct dentry *, char *, size_t);
 
@@ -153,3 +155,12 @@ static inline int ext4_init_security(handle_t *handle, struct inode *inode,
 	return 0;
 }
 #endif
+
+#ifdef CONFIG_EXT4_METAGROUP
+extern int ext4_metagroup_read(struct inode *inode, unsigned int *mid);
+#else
+inline int ext4_metagroup_read(struct inode *inode, unsigned int *mid)
+{
+	return -ENOTSUPP;
+}
+#endif
diff --git a/fs/ext4/xattr_metagroup.c b/fs/ext4/xattr_metagroup.c
new file mode 100644
index 0000000..5585d4d
--- /dev/null
+++ b/fs/ext4/xattr_metagroup.c
@@ -0,0 +1,153 @@
+/*
+ * linux/fs/ext4/xattr_metagroup.c
+ *
+ * Copyright (C) 2010 Parallels Inc
+ * Dmitry Monakhov <dmonakhov@openvz.org>
+ */
+
+#include <linux/init.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+#include <linux/capability.h>
+#include <linux/fs.h>
+#include <linux/quotaops.h>
+#include "ext4_jbd2.h"
+#include "ext4.h"
+#include "xattr.h"
+
+/*
+ * Read metagroup id from inode's xattr
+ * Locking: none
+ */
+int ext4_metagroup_read(struct inode *inode, unsigned int *mid)
+{
+	__le32 dsk_mid;
+	int retval;
+	retval = ext4_xattr_get(inode, EXT4_XATTR_INDEX_METAGROUP, "",
+				&dsk_mid, sizeof (dsk_mid));
+	if (retval > 0 && retval != sizeof(dsk_mid))
+		return -EIO;
+	*mid = le32_to_cpu(dsk_mid);
+	return retval;
+
+}
+
+/*
+ * Save metagroup id to inode's xattr
+ * Locking: none
+ */
+static int ext4_metagroup_write(handle_t *handle, struct inode *inode,
+				unsigned int mid, int xflags)
+{
+	__le32 dsk_mid;
+	int retval;
+	retval = ext4_xattr_set_handle(handle, inode, EXT4_XATTR_INDEX_METAGROUP, "",
+				&dsk_mid, sizeof (dsk_mid), xflags);
+	if (retval > 0 && retval != sizeof(dsk_mid))
+		return -EIO;
+	return retval;
+}
+
+/*
+ * Change metagroup id.
+ * Called under inode->i_mutex
+ */
+static int ext4_metagroup_change(struct inode *inode, unsigned int new_mid)
+{
+	/*
+	 * One data_trans_blocks chunk for xattr update.
+	 * One quota_trans_blocks chunk for quota transfer, and one
+	 * quota_trans_block chunk for emergency quota rollback transfer,
+	 * because quota rollback may result new quota blocks allocation.
+	 */
+	unsigned credits = EXT4_DATA_TRANS_BLOCKS(inode->i_sb) +
+		EXT4_QUOTA_TRANS_BLOCKS(inode->i_sb) * 2;
+	qid_t qid[MAXQUOTAS];
+	int ret, ret2 = 0;
+	unsigned retries = 0;
+	handle_t *handle;
+
+	vfs_dq_init(inode);
+retry:
+	handle = ext4_journal_start(inode, credits);
+	if (IS_ERR(handle)) {
+		ret = PTR_ERR(handle);
+		ext4_std_error(inode->i_sb, ret);
+		goto out;
+	}
+	/* Inode may not have metagroup xattr yet. Create it explicitly */
+	ret = ext4_metagroup_write(handle, inode, EXT4_I(inode)->i_mid,
+			XATTR_CREATE);
+	if (ret == -EEXIST)
+		ret = 0;
+	if (ret) {
+		ret2 = ext4_journal_stop(handle);
+		if (ret2)
+			ret = ret2;
+		if (ret == -ENOSPC &&
+			ext4_should_retry_alloc(inode->i_sb, &retries))
+			goto retry;
+	}
+#ifdef CONFIG_QUOTA
+	qid[MGRQUOTA] = new_mid;
+	if (inode->i_sb->dq_op->transfer(inode, qid, 1 << MGRQUOTA))
+		ret = -EDQUOT;
+#endif
+	ret = ext4_metagroup_write(handle, inode, new_mid, XATTR_REPLACE);
+	if (ret) {
+		/*
+		 * Function may fail only due to fatal error, Nor than less
+		 * we have try to rollback quota changes.
+		 */
+#ifdef CONFIG_QUOTA
+		qid[MGRQUOTA] = EXT4_I(inode)->i_mid;
+		if (inode->i_sb->dq_op->transfer(inode, qid, 1 << MGRQUOTA))
+			ret = -EDQUOT;
+#endif
+		ext4_std_error(inode->i_sb, ret);
+
+	}
+	EXT4_I(inode)->i_mid = new_mid;
+	ret2 = ext4_journal_stop(handle);
+out:
+	if (ret2)
+		ret = ret2;
+	return ret;
+}
+static size_t
+ext4_xattr_metagroup_list(struct dentry *dentry, char *list, size_t list_size,
+		const char *name, size_t name_len, int type)
+{
+	if (list && XATTR_METAGROUP_LEN <= list_size)
+		memcpy(list, XATTR_METAGROUP_PREFIX, XATTR_METAGROUP_LEN);
+	return XATTR_METAGROUP_LEN;
+
+}
+
+static int
+ext4_xattr_metagroup_get(struct dentry *dentry, const char *name,
+		       void *buffer, size_t size, int type)
+{
+	if (strcmp(name, "") != 0)
+		return -EINVAL;
+	return ext4_xattr_get(dentry->d_inode, EXT4_XATTR_INDEX_METAGROUP,
+			      name, buffer, size);
+}
+
+static int
+ext4_xattr_metagroup_set(struct dentry *dentry, const char *name,
+		const void *value, size_t size, int flags, int type)
+{
+	unsigned int new_mid;
+	if (strcmp(name, "") != 0)
+		return -EINVAL;
+	new_mid = simple_strtoul(value, (char **)&value, 0);
+	return ext4_metagroup_change(dentry->d_inode, new_mid);
+}
+
+struct xattr_handler ext4_xattr_metagroup_handler = {
+	.prefix	= XATTR_METAGROUP,
+	.list	= ext4_xattr_metagroup_list,
+	.get	= ext4_xattr_metagroup_get,
+	.set	= ext4_xattr_metagroup_set,
+};
-- 
1.6.6


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/6] vfs: add per-sb auxiliary inode attribute table
  2010-02-18 16:45 ` [PATCH 1/6] vfs: add per-sb auxiliary inode attribute table Dmitry Monakhov
  2010-02-18 16:45   ` [PATCH 2/6] quota: switch reservation space management to aux_attribute Dmitry Monakhov
@ 2010-02-18 19:00   ` Brad Boyer
  2010-02-18 19:34     ` Dmitry Monakhov
  1 sibling, 1 reply; 13+ messages in thread
From: Brad Boyer @ 2010-02-18 19:00 UTC (permalink / raw)
  To: Dmitry Monakhov; +Cc: linux-fsdevel

On Thu, Feb 18, 2010 at 07:45:25PM +0300, Dmitry Monakhov wrote:
> Some times it is useful to export non standard attributes
> to generic vfs layer, but it is too expansive to store it
> inside vfs inode. Let's introduce generic interface for this
> purpose. One may declare an attribute and filesystem provides
> access to it, if necessery.
> 
> Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
> ---
> @@ -1576,7 +1578,10 @@ struct super_operations {
>  #endif
>  	int (*bdev_try_to_free_page)(struct super_block*, struct page*, gfp_t);
>  };
> -
> +struct aux_attributes
> +{
> +	int supported;
> +};
>  /*
>   * Inode state bits.  Protected by inode_lock.
>   *
> -- 

What is the intended use of the supported field? You don't appear to use
it anywhere other than to initialize it to 1 in the one instance where
you create one of them.

	Brad Boyer
	flar@allandria.com


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/6] vfs: add per-sb auxiliary inode attribute table
  2010-02-18 19:00   ` [PATCH 1/6] vfs: add per-sb auxiliary inode attribute table Brad Boyer
@ 2010-02-18 19:34     ` Dmitry Monakhov
  0 siblings, 0 replies; 13+ messages in thread
From: Dmitry Monakhov @ 2010-02-18 19:34 UTC (permalink / raw)
  To: Brad Boyer; +Cc: linux-fsdevel

Brad Boyer <flar@allandria.com> writes:

> On Thu, Feb 18, 2010 at 07:45:25PM +0300, Dmitry Monakhov wrote:
>> Some times it is useful to export non standard attributes
>> to generic vfs layer, but it is too expansive to store it
>> inside vfs inode. Let's introduce generic interface for this
>> purpose. One may declare an attribute and filesystem provides
>> access to it, if necessery.
>> 
>> Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
>> ---
>> @@ -1576,7 +1578,10 @@ struct super_operations {
>>  #endif
>>  	int (*bdev_try_to_free_page)(struct super_block*, struct page*, gfp_t);
>>  };
>> -
>> +struct aux_attributes
>> +{
>> +	int supported;
>> +};
>>  /*
>>   * Inode state bits.  Protected by inode_lock.
>>   *
>> -- 
>
> What is the intended use of the supported field? You don't appear to use
> it anywhere other than to initialize it to 1 in the one instance where
> you create one of them.
Actually i've use this only as a place holder, otherwise structure
will be empty. 
>
> 	Brad Boyer
> 	flar@allandria.com

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/6] RFC: introduce extended inode owner identifier v4
  2010-02-18 16:45 [PATCH 0/6] RFC: introduce extended inode owner identifier v4 Dmitry Monakhov
  2010-02-18 16:45 ` [PATCH 1/6] vfs: add per-sb auxiliary inode attribute table Dmitry Monakhov
@ 2010-02-18 23:31 ` Dave Chinner
  2010-02-19 10:16   ` Dmitry Monakhov
  1 sibling, 1 reply; 13+ messages in thread
From: Dave Chinner @ 2010-02-18 23:31 UTC (permalink / raw)
  To: Dmitry Monakhov; +Cc: linux-fsdevel

On Thu, Feb 18, 2010 at 07:45:24PM +0300, Dmitry Monakhov wrote:
> This is new generation of attempt to add extended inode identifier.
> In previous posts it was called tree_id, subtree_id, project_id.
> But after none of this was not good enough. I've refused project_id
> because it is well know XFS feature.

Admins, users and developers of mangement tools are all going to
hate us if we introduce subtly different "project/directory quota
like" accounting to different filesystems with different
administration mechanisms.

The fact that project quotas are already implemented in XFS is not a
valid reason for creating a new, slightly less functional,
incompatible implementation of the same feature in other
filesystems.

> And my implementation is
> slightly different from it especially from user-space point of view.

This is exactly my point - if a user has an ext4 filesystem and an
xfs filesystem then your proposal will result in them needing two
different mechanisms to manage the project/directory quotas on their
filesystems.  This result is not desirable from a system design
perspective.  Management of such a feature needs to be consistent
across all filesystem types - just like it is for user and group
quotas - and we already have a widely used and well tested
management interface that can be used to implement exactly what you
need.

> In order to avoid ambiguity i've stopped at the "metagroup" term.
> I hope it is final name for the feature.

I think "metagroup" is too abstract and will likely be confused with
group quotas by those that don't understand what it is. i.e it does
not convey any information about the bounds of the quota container
(unlike user, group, directory or project).

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/6] RFC: introduce extended inode owner identifier v4
  2010-02-18 23:31 ` [PATCH 0/6] RFC: introduce extended inode owner identifier v4 Dave Chinner
@ 2010-02-19 10:16   ` Dmitry Monakhov
  2010-02-19 23:31     ` Dave Chinner
  0 siblings, 1 reply; 13+ messages in thread
From: Dmitry Monakhov @ 2010-02-19 10:16 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-fsdevel

Dave Chinner <david@fromorbit.com> writes:

> On Thu, Feb 18, 2010 at 07:45:24PM +0300, Dmitry Monakhov wrote:
>> This is new generation of attempt to add extended inode identifier.
>> In previous posts it was called tree_id, subtree_id, project_id.
>> But after none of this was not good enough. I've refused project_id
>> because it is well know XFS feature.
>
> Admins, users and developers of mangement tools are all going to
> hate us if we introduce subtly different "project/directory quota
> like" accounting to different filesystems with different
> administration mechanisms.
Seems what you right here.
>
> The fact that project quotas are already implemented in XFS is not a
> valid reason for creating a new, slightly less functional,
> incompatible implementation of the same feature in other
> filesystems.
>
>> And my implementation is
>> slightly different from it especially from user-space point of view.
>
> This is exactly my point - if a user has an ext4 filesystem and an
> xfs filesystem then your proposal will result in them needing two
> different mechanisms to manage the project/directory quotas on their
> filesystems.  This result is not desirable from a system design
> perspective.  Management of such a feature needs to be consistent
> across all filesystem types - just like it is for user and group
> quotas - and we already have a widely used and well tested
> management interface that can be used to implement exactly what you
> need.
Not exactly. XFS  allow only subtree-like structure (link, rename are
restricted). Personally I think what right restriction, but someone may
want to have not subtree-like hierarchy. So this patch doesn't introduce
any link/rename rules. If user want to restrict his tree it will use
bindmount. IMHO it is more intuitive than XFS does.
But again you definitely right about feature_names/interfaces ambiguity 
If we can create common interface it would be great. See later in 
the mail.
>
>> In order to avoid ambiguity i've stopped at the "metagroup" term.
>> I hope it is final name for the feature.
>
> I think "metagroup" is too abstract and will likely be confused with
> group quotas by those that don't understand what it is. i.e it does
> not convey any information about the bounds of the quota container
> (unlike user, group, directory or project).
Ok. Since we want common interface we should use well known "project_id"
term.

I think we can try to unify it in following way:
*User interface*
As soon as i understand XFS manage projid via xfs_ioctl_setattr, 
struct fsxattr. IMHO it is not good idea to make this interface common
for all filesystems. Let's use standard i_op->setxattr/getxattr for
this purpose. Let's name this xattr as "system.project_id".
And xfs may easily catch corresponding setxattr/getxatrr and translate
it to it's ioctl interface, so both interfaces will be equal.
At least xattr interface already supported by various utils (tar,
rsync, etc).

*Link/Rename behavior*
 Let's introduce two modes:
 1) SHARED project hierarchy: without restrictions for link/renames
 2) ISOLATED project hierarchy: Well known XFS (subtrees like)
    link/rename rules
 And support this two mode like this:
 generic_fs)
       SHARED: by default 
       ISOLATED: via bindmount
 XFS)
       ISOLATED: by default, because this is expected semantics (no
                 changes required)
       SHARED: xfs may add "shared_project" mount feature to disable
               isolation semantics. At least this gives user more
               flexibility than before.
 We have to document such difference. In order to avoid misbehavior.

*VFS interface to project_id*
 In order to make profit of project_id we have to make it visible to
 vfs layer, and let quota and nfsd (any other users?) exploit this.
 Let's use proposed per-sb aux_attributes table for this purpose.
 Off course i was wrong then proposed to export pointer to project_id
 (former metagroup) var. Since this value is read-only we have to
 export it like this: unsigned get_project_id(struct inode *inode)
 And document what project_id changes are guarded by inode->i_mutex
 So caller have to grab i_mutex in order to avoid races.

What do you think?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/6] RFC: introduce extended inode owner identifier v4
  2010-02-19 10:16   ` Dmitry Monakhov
@ 2010-02-19 23:31     ` Dave Chinner
  2010-02-20 10:58       ` Dmitry Monakhov
  0 siblings, 1 reply; 13+ messages in thread
From: Dave Chinner @ 2010-02-19 23:31 UTC (permalink / raw)
  To: Dmitry Monakhov; +Cc: linux-fsdevel

On Fri, Feb 19, 2010 at 01:16:47PM +0300, Dmitry Monakhov wrote:
> Dave Chinner <david@fromorbit.com> writes:
> 
> > On Thu, Feb 18, 2010 at 07:45:24PM +0300, Dmitry Monakhov wrote:
> >> This is new generation of attempt to add extended inode identifier.
> >> In previous posts it was called tree_id, subtree_id, project_id.
> >> But after none of this was not good enough. I've refused project_id
> >> because it is well know XFS feature.
> >
> > Admins, users and developers of mangement tools are all going to
> > hate us if we introduce subtly different "project/directory quota
> > like" accounting to different filesystems with different
> > administration mechanisms.
> Seems what you right here.
> >
> > The fact that project quotas are already implemented in XFS is not a
> > valid reason for creating a new, slightly less functional,
> > incompatible implementation of the same feature in other
> > filesystems.
> >
> >> And my implementation is
> >> slightly different from it especially from user-space point of view.
> >
> > This is exactly my point - if a user has an ext4 filesystem and an
> > xfs filesystem then your proposal will result in them needing two
> > different mechanisms to manage the project/directory quotas on their
> > filesystems.  This result is not desirable from a system design
> > perspective.  Management of such a feature needs to be consistent
> > across all filesystem types - just like it is for user and group
> > quotas - and we already have a widely used and well tested
> > management interface that can be used to implement exactly what you
> > need.
> Not exactly. XFS  allow only subtree-like structure

Not true at all.  XFS allows an arbitrary distribution of files in a
given project - they are not restricted to subtrees. This isn't
widely used because it requires manually setting the project ID
after the file is created. e.g. create a backup tarball of a project
heirarchy in an external non-controlled directory, then change the
project ID of the tarball to the correct project ID so that the
backup is also accounted to the correct project...

For example, I'll create a new project (testproj) and subtree
(/mnt/xfs/foo) associated with the project, create a 25MB file
inside the subtree, show it being accounted, the copy it outside
the subtree, show it isn't accounted, then change the project ID
of the outside copy to testproj and show that it is accounted to
the testproj even though it is outside the subtree:

# mkfs.xfs -f /dev/ubd/1
[.....]
# mount -o prjquota /dev/ubd/1 /mnt/xfs
# mkdir /mnt/xfs/foo
#
#
# echo testproj:42 >> /etc/projid
# echo 42:/mnt/xfs/foo >> /etc/projects
# xfs_quota -x -c 'project -s testproj' /mnt/xfs
Setting up project testproj (path /mnt/xfs/foo)...
Processed 1 /etc/projects paths for project testproj
#
#
#
# xfs_quota -x -c 'limit -p bhard=1g testproj' /mnt/xfs
# xfs_quota -x -c print /mnt/xfs
Filesystem          Pathname
/mnt/xfs            /dev/ubd/1 (pquota)
/mnt/xfs/foo        /dev/ubd/1 (project 42, testproj)
# xfs_quota -x -c report /mnt/xfs
Project quota on /mnt/xfs (/dev/ubd/1)
                               Blocks
Project ID       Used       Soft       Hard    Warn/Grace
---------- --------------------------------------------------
testproj            0          0    1048576     00 [--------]

#
#
#
# dd if=/dev/zero of=foo/testfile bs=1024k count=25
25+0 records in
25+0 records out
26214400 bytes (26 MB) copied, 0.116102 s, 226 MB/s
# sudo xfs_quota -x -c report /mnt/xfs
Project quota on /mnt/xfs (/dev/ubd/1)
                               Blocks
Project ID       Used       Soft       Hard    Warn/Grace
---------- --------------------------------------------------
testproj        25600          0    1048576     00 [--------]

#
#
#
# cp foo/testfile .
# sync
# xfs_quota -x -c report /mnt/xfs
Project quota on /mnt/xfs (/dev/ubd/1)
                               Blocks
Project ID       Used       Soft       Hard    Warn/Grace
---------- --------------------------------------------------
testproj        25600          0    1048576     00 [--------]

#
#
#
# xfs_io -f -c "chproj 42" testfile
# xfs_quota -x -c report /mnt/xfs
Project quota on /mnt/xfs (/dev/ubd/1)
                               Blocks
Project ID       Used       Soft       Hard    Warn/Grace
---------- --------------------------------------------------
testproj        51200          0    1048576     00 [--------]

#

> (link, rename are restricted).

The EXDEV on rename behaviour is purely an implementation detail -
it makes quota accounting in XFS simple. i.e. rename returns EXDEV
so that a mv(1) will fall back to create/copy/unlink and that
automatically gets the quota accounting correct. That is, it didn't
require a complex extension of dquot handling in the rename
transaction to implement.  This one could be fixed, and a couple of
ppl have actually asked recently if it could be done because moving
a few TB of data between projects is time consuming.

However, hard links are a different matter. If you can clearly
determine how to hard link a file into multiple different projects
(dquots), then track and account for all the space used in a sane
manner, work out how to account for new or removed files in such a
hardlinked directory, etc, then you can allow hard links between
different subtrees.

For example, if you add a new file into such a hard linked
directory, who does it get accounted to? What happens if you then
move a multiple-hard linked file to a different subtree? If the
inode is accounted to all projects, then each of these filesystem
transactions requires updating an arbitrary (unbound) number of
dquots - this alone makes journal reservations for transactions a
nightmare to calculate and greatly increases the complexity of such
transactions.

Disallowing hard links between directories in different projects
makes these cans of worms go away - it is a very practical design
choice to make. However, it in no way results in XFS project quotas
being restricted to subtrees - it is a *change of project quota*
that triggers these behaviours.

> Personally I think what right restriction, but someone may
> want to have not subtree-like hierarchy. So this patch doesn't introduce
> any link/rename rules.

The link/rename behaviour of XFS does not prevent this type of usage
at all.

> If user want to restrict his tree it will use
> bindmount. IMHO it is more intuitive than XFS does.

XFS is not trying to implement bind mount -like restrictions. The
behaviour was carefully designed to allow project quota's to be
sanely implemented.

> But again you definitely right about feature_names/interfaces ambiguity 
> If we can create common interface it would be great. See later in 
> the mail.
> >
> >> In order to avoid ambiguity i've stopped at the "metagroup" term.
> >> I hope it is final name for the feature.
> >
> > I think "metagroup" is too abstract and will likely be confused with
> > group quotas by those that don't understand what it is. i.e it does
> > not convey any information about the bounds of the quota container
> > (unlike user, group, directory or project).
> Ok. Since we want common interface we should use well known "project_id"
> term.
> 
> I think we can try to unify it in following way:
> *User interface*
> As soon as i understand XFS manage projid via xfs_ioctl_setattr, 
> struct fsxattr. IMHO it is not good idea to make this interface common
> for all filesystems. Let's use standard i_op->setxattr/getxattr for
> this purpose. Let's name this xattr as "system.project_id".

That's fine by me. I'd much prefer that we used the xattr interface
for inode attributes instead of poking bits through fcntl or ioctls...

> And xfs may easily catch corresponding setxattr/getxatrr and translate
> it to it's ioctl interface, so both interfaces will be equal.
> At least xattr interface already supported by various utils (tar,
> rsync, etc).

Well, the point of the way XFS implements project quotas is that
utilities such as cp, mv, tar, rsync, etc do not need to know
anything about them - just like user/group quotas.

If we go down the xattr route, then these utilities can't be allowed
to copy these xattrs to new files; the filesystem has to create them
atomically with the new inodes so that they are accounted correctly.
If they are created non-atomically and the system crashes between
creating the file and applying the quota xattr, then you have an
inconsistency that only a quotacheck will pick up....

> *Link/Rename behavior*
>  Let's introduce two modes:
>  1) SHARED project hierarchy: without restrictions for link/renames

See above - I don't think "without restrictions" can be easily
implemented because of the complexity hard links introduce.

>  2) ISOLATED project hierarchy: Well known XFS (subtrees like)
>     link/rename rules
>  And support this two mode like this:
>  generic_fs)
>        SHARED: by default 
>        ISOLATED: via bindmount
>  XFS)

This is a change of behaviour from the existing XFS project quota
configurations as they do not require bind mounts at all.

I'm interested to know how you see this working when you have
multiple subtrees with the same project ID? Renaming and linking
between those subtrees is currently possible with XFS project IDs,
but adding bind mounts would cause EXDEV to be returned for these
operations. i.e. It seems to me that these subtrees are "shared" by
your definition, but the addition of bind mounts makes them
"isolated".

Or you want a part of a subtree to be moved to a different project
ID because it needs to be accounted separately?  e.g. a group gets
moved in the organisation heirarchy, so the bean counters want to
change the project ID on all their files so there space usage can be
billed to the new department. If bind mounts are involved, this
quickly becomes complex and unmaintainable. It's not something that
users can easily manage, especially compared to the current 'xfs_io
-c "chproj -R <projid>" /path/to/subtree' method of doing this.

----

IMO focusing on link/rename restrictions as the deciding factor in
defining the user interface is wrong. I started out by saying that
having different user interfaces for different filesystems is not
desirable. You've ended up trying to encode the differences you
assume exist into a new user interface instead.

I'll rephrase the question - what part of the existing XFS project
quota administration interface (i.e. /etc/projects, /etc/projid,  a
quota command to set up the initial tree, etc) is not sufficient for
your purposes of defining and managing subtrees?  If it is not
sufficient, what simple extensions can we add that will make it
sufficient? Once we've got the high level management interface
defined, everything else is just details. ;)

>        ISOLATED: by default, because this is expected semantics (no
>                  changes required)
>        SHARED: xfs may add "shared_project" mount feature to disable
>                isolation semantics. At least this gives user more
>                flexibility than before.
>  We have to document such difference. In order to avoid misbehavior.

> *VFS interface to project_id*
>  In order to make profit of project_id we have to make it visible to
>  vfs layer, and let quota and nfsd (any other users?) exploit this.
>  Let's use proposed per-sb aux_attributes table for this purpose.

Why go to that complexity? Just add a 32 bit proj_id identifier to
the struct inode. If it's supposed to be generic, then simply
implement it like user and group quotas are.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/6] RFC: introduce extended inode owner identifier v4
  2010-02-19 23:31     ` Dave Chinner
@ 2010-02-20 10:58       ` Dmitry Monakhov
  0 siblings, 0 replies; 13+ messages in thread
From: Dmitry Monakhov @ 2010-02-20 10:58 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-fsdevel

[-- Attachment #1: Type: text/plain, Size: 14446 bytes --]

Dave Chinner <david@fromorbit.com> writes:

> On Fri, Feb 19, 2010 at 01:16:47PM +0300, Dmitry Monakhov wrote:
>> Dave Chinner <david@fromorbit.com> writes:
>> 
>> > On Thu, Feb 18, 2010 at 07:45:24PM +0300, Dmitry Monakhov wrote:
>> >> This is new generation of attempt to add extended inode identifier.
>> >> In previous posts it was called tree_id, subtree_id, project_id.
>> >> But after none of this was not good enough. I've refused project_id
>> >> because it is well know XFS feature.
>> >
>> > Admins, users and developers of mangement tools are all going to
>> > hate us if we introduce subtly different "project/directory quota
>> > like" accounting to different filesystems with different
>> > administration mechanisms.
>> Seems what you right here.
>> >
>> > The fact that project quotas are already implemented in XFS is not a
>> > valid reason for creating a new, slightly less functional,
>> > incompatible implementation of the same feature in other
>> > filesystems.
>> >
>> >> And my implementation is
>> >> slightly different from it especially from user-space point of view.
>> >
>> > This is exactly my point - if a user has an ext4 filesystem and an
>> > xfs filesystem then your proposal will result in them needing two
>> > different mechanisms to manage the project/directory quotas on their
>> > filesystems.  This result is not desirable from a system design
>> > perspective.  Management of such a feature needs to be consistent
>> > across all filesystem types - just like it is for user and group
>> > quotas - and we already have a widely used and well tested
>> > management interface that can be used to implement exactly what you
>> > need.
>> Not exactly. XFS  allow only subtree-like structure
>
> Not true at all.  XFS allows an arbitrary distribution of files in a
> given project - they are not restricted to subtrees. This isn't
> widely used because it requires manually setting the project ID
> after the file is created. e.g. create a backup tarball of a project
> heirarchy in an external non-controlled directory, then change the
> project ID of the tarball to the correct project ID so that the
> backup is also accounted to the correct project...
>
> For example, I'll create a new project (testproj) and subtree
> (/mnt/xfs/foo) associated with the project, create a 25MB file
> inside the subtree, show it being accounted, the copy it outside
> the subtree, show it isn't accounted, then change the project ID
> of the outside copy to testproj and show that it is accounted to
> the testproj even though it is outside the subtree:
>
> # mkfs.xfs -f /dev/ubd/1
> [.....]
> # mount -o prjquota /dev/ubd/1 /mnt/xfs
> # mkdir /mnt/xfs/foo
> #
> #
> # echo testproj:42 >> /etc/projid
> # echo 42:/mnt/xfs/foo >> /etc/projects
> # xfs_quota -x -c 'project -s testproj' /mnt/xfs
> Setting up project testproj (path /mnt/xfs/foo)...
> Processed 1 /etc/projects paths for project testproj
> #
> #
> #
> # xfs_quota -x -c 'limit -p bhard=1g testproj' /mnt/xfs
> # xfs_quota -x -c print /mnt/xfs
> Filesystem          Pathname
> /mnt/xfs            /dev/ubd/1 (pquota)
> /mnt/xfs/foo        /dev/ubd/1 (project 42, testproj)
> # xfs_quota -x -c report /mnt/xfs
> Project quota on /mnt/xfs (/dev/ubd/1)
>                                Blocks
> Project ID       Used       Soft       Hard    Warn/Grace
> ---------- --------------------------------------------------
> testproj            0          0    1048576     00 [--------]
>
> #
> #
> #
> # dd if=/dev/zero of=foo/testfile bs=1024k count=25
> 25+0 records in
> 25+0 records out
> 26214400 bytes (26 MB) copied, 0.116102 s, 226 MB/s
> # sudo xfs_quota -x -c report /mnt/xfs
> Project quota on /mnt/xfs (/dev/ubd/1)
>                                Blocks
> Project ID       Used       Soft       Hard    Warn/Grace
> ---------- --------------------------------------------------
> testproj        25600          0    1048576     00 [--------]
>
> #
> #
> #
> # cp foo/testfile .
> # sync
> # xfs_quota -x -c report /mnt/xfs
> Project quota on /mnt/xfs (/dev/ubd/1)
>                                Blocks
> Project ID       Used       Soft       Hard    Warn/Grace
> ---------- --------------------------------------------------
> testproj        25600          0    1048576     00 [--------]
>
> #
> #
> #
> # xfs_io -f -c "chproj 42" testfile
> # xfs_quota -x -c report /mnt/xfs
> Project quota on /mnt/xfs (/dev/ubd/1)
>                                Blocks
> Project ID       Used       Soft       Hard    Warn/Grace
> ---------- --------------------------------------------------
> testproj        51200          0    1048576     00 [--------]
>
> #
>
>
>> (link, rename are restricted).
>
> The EXDEV on rename behaviour is purely an implementation detail -
> it makes quota accounting in XFS simple. i.e. rename returns EXDEV
> so that a mv(1) will fall back to create/copy/unlink and that
> automatically gets the quota accounting correct. That is, it didn't
> require a complex extension of dquot handling in the rename
> transaction to implement.  This one could be fixed, and a couple of
> ppl have actually asked recently if it could be done because moving
> a few TB of data between projects is time consuming.
>
> However, hard links are a different matter. If you can clearly
> determine how to hard link a file into multiple different projects
> (dquots), then track and account for all the space used in a sane
> manner, work out how to account for new or removed files in such a
> hardlinked directory, etc, then you can allow hard links between
> different subtrees.
Yess. I do understand that. In fact initially i've specify
rename/link rules by myself, later i've discovered that XFS
implemented this long time ago in exact same way.

BTW: renames also is not so simple because renaming file which
has more than one hardlinks result in same madness situation.

But as AlViro pointed this semantics is already implemented
in bindmout (IMHO the only bad thing is that bindmount is not
persistent structure).
We just give user a rope and it his decision to shoot, or not
to shoot himself.
Otherwise. We may try to force AlViro to like hardlink isolation idea.
May be restrict this tiny rule under CONFIG_PROJECT_ID_ISOLATED
config option.
>
> For example, if you add a new file into such a hard linked
> directory, who does it get accounted to? What happens if you then
> move a multiple-hard linked file to a different subtree?
By assumption inode may belongs only to one project, the one thich
stored inside private_inode->i_prjid. It will be accounted in
that quota.
> If the
> inode is accounted to all projects, then each of these filesystem
> transactions requires updating an arbitrary (unbound) number of
> dquots - this alone makes journal reservations for transactions a
> nightmare to calculate and greatly increases the complexity of such
> transactions.
>
> Disallowing hard links between directories in different projects
> makes these cans of worms go away - it is a very practical design
> choice to make. However, it in no way results in XFS project quotas
> being restricted to subtrees - it is a *change of project quota*
> that triggers these behaviours.
>
>> Personally I think what right restriction, but someone may
>> want to have not subtree-like hierarchy. So this patch doesn't introduce
>> any link/rename rules.
>
> The link/rename behaviour of XFS does not prevent this type of usage
> at all.
>
>> If user want to restrict his tree it will use
>> bindmount. IMHO it is more intuitive than XFS does.
>
> XFS is not trying to implement bind mount -like restrictions. The
> behaviour was carefully designed to allow project quota's to be
> sanely implemented.
>
>> But again you definitely right about feature_names/interfaces ambiguity 
>> If we can create common interface it would be great. See later in 
>> the mail.
>> >
>> >> In order to avoid ambiguity i've stopped at the "metagroup" term.
>> >> I hope it is final name for the feature.
>> >
>> > I think "metagroup" is too abstract and will likely be confused with
>> > group quotas by those that don't understand what it is. i.e it does
>> > not convey any information about the bounds of the quota container
>> > (unlike user, group, directory or project).
>> Ok. Since we want common interface we should use well known "project_id"
>> term.
>> 
>> I think we can try to unify it in following way:
>> *User interface*
>> As soon as i understand XFS manage projid via xfs_ioctl_setattr, 
>> struct fsxattr. IMHO it is not good idea to make this interface common
>> for all filesystems. Let's use standard i_op->setxattr/getxattr for
>> this purpose. Let's name this xattr as "system.project_id".
>
> That's fine by me. I'd much prefer that we used the xattr interface
> for inode attributes instead of poking bits through fcntl or ioctls...
>
>> And xfs may easily catch corresponding setxattr/getxatrr and translate
>> it to it's ioctl interface, so both interfaces will be equal.
>> At least xattr interface already supported by various utils (tar,
>> rsync, etc).
>
> Well, the point of the way XFS implements project quotas is that
> utilities such as cp, mv, tar, rsync, etc do not need to know
> anything about them - just like user/group quotas.
>
> If we go down the xattr route, then these utilities can't be allowed
> to copy these xattrs to new files; the filesystem has to create them
> atomically with the new inodes so that they are accounted correctly.
Exactly. It is like init_acl  init_security works on inode creation
(see fs/ext4/ialloc.c). I've (by occasion) miss that in posted
version of ext4-add-metagroup-support patch
In fact i have to confess that ext4-metagroup-patrt was in not working
state at a posting time. Currently it's seems to work, see patch attached. 

BTW project_id changing procedure is looks really ugly because we have to
perform two things in a row quota_transfer, proj_id update
if we enabled to update project_id then we have to roll-back
quota. And in fact this may result in -EDQUOT because currently
quota has not *force* charge flag :)
> If they are created non-atomically and the system crashes between
> creating the file and applying the quota xattr, then you have an
> inconsistency that only a quotacheck will pick up....
This is just the way how it works for now
each tar like application works like this 
1)open
2)write
3)chown
4)chmod
So it somethings happens before (3) will be accounted to current user_id

>
>> *Link/Rename behavior*
>>  Let's introduce two modes:
>>  1) SHARED project hierarchy: without restrictions for link/renames
>
> See above - I don't think "without restrictions" can be easily
> implemented because of the complexity hard links introduce.
>
>>  2) ISOLATED project hierarchy: Well known XFS (subtrees like)
>>     link/rename rules
>>  And support this two mode like this:
>>  generic_fs)
>>        SHARED: by default 
>>        ISOLATED: via bindmount
>>  XFS)
>
> This is a change of behaviour from the existing XFS project quota
> configurations as they do not require bind mounts at all.
>
> I'm interested to know how you see this working when you have
> multiple subtrees with the same project ID? Renaming and linking
Yepp good catch.
> between those subtrees is currently possible with XFS project IDs,
> but adding bind mounts would cause EXDEV to be returned for these
> operations. i.e. It seems to me that these subtrees are "shared" by
> your definition, but the addition of bind mounts makes them
> "isolated".
>
> Or you want a part of a subtree to be moved to a different project
> ID because it needs to be accounted separately?  e.g. a group gets
> moved in the organisation heirarchy, so the bean counters want to
> change the project ID on all their files so there space usage can be
> billed to the new department. If bind mounts are involved, this
> quickly becomes complex and unmaintainable. It's not something that
> users can easily manage, especially compared to the current 'xfs_io
> -c "chproj -R <projid>" /path/to/subtree' method of doing this.
Seem we have to work on "vfs people to like isolation subtrees" plan.
>
> ----
>
> IMO focusing on link/rename restrictions as the deciding factor in
> defining the user interface is wrong. I started out by saying that
> having different user interfaces for different filesystems is not
> desirable. You've ended up trying to encode the differences you
> assume exist into a new user interface instead.
>
> I'll rephrase the question - what part of the existing XFS project
> quota administration interface (i.e. /etc/projects, /etc/projid,  a
> quota command to set up the initial tree, etc) is not sufficient for
> your purposes of defining and managing subtrees?  If it is not
> sufficient, what simple extensions can we add that will make it
> sufficient? Once we've got the high level management interface
> defined, everything else is just details. ;)
>
XFS interface it enough. IMHO it is kinda rich. But still all necessary
things are already there. 
>>        ISOLATED: by default, because this is expected semantics (no
>>                  changes required)
>>        SHARED: xfs may add "shared_project" mount feature to disable
>>                isolation semantics. At least this gives user more
>>                flexibility than before.
>>  We have to document such difference. In order to avoid misbehavior.
>
>
>> *VFS interface to project_id*
>>  In order to make profit of project_id we have to make it visible to
>>  vfs layer, and let quota and nfsd (any other users?) exploit this.
>>  Let's use proposed per-sb aux_attributes table for this purpose.
>
> Why go to that complexity? Just add a 32 bit proj_id identifier to
> the struct inode. If it's supposed to be generic, then simply
> implement it like user and group quotas are.
Off course this is best solution. But then i've added i_rsv_space
field to vfs_inode to support quota allocation for delayed allocation.
Many peoples was fairly against idea to bloat a vfs_inode.
So i've come in to idea to design some aux_inode_table. And allow
everybody to put they crap in to that table without big discussions.

But project_id case is better because it can be hided under
CONFIG_PROJECT_ID option. So wasting of space not happen.
next round i'll embed it in to vfs_inode and if people will
be really blame on this, we will beck to aux_inode_table approach.
I've plan to post next generation next Monday.


[-- Attachment #2: 0001-ext4-Implement-project-ID-support-for-ext4-filesyste.patch --]
[-- Type: text/plain, Size: 11747 bytes --]

>From bb16c459d7d9bee5de5f4a7885ac1edfac0c34aa Mon Sep 17 00:00:00 2001
From: Dmitry Monakhov <dmonakhov@openvz.org>
Date: Sat, 20 Feb 2010 13:38:27 +0300
Subject: [PATCH] ext4: Implement project ID support for ext4 filesystem


Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
---
 fs/ext4/Kconfig            |    8 ++
 fs/ext4/Makefile           |    1 +
 fs/ext4/ext4.h             |    8 ++-
 fs/ext4/ialloc.c           |    8 ++-
 fs/ext4/inode.c            |   13 +++-
 fs/ext4/super.c            |    9 ++-
 fs/ext4/xattr.c            |    7 ++
 fs/ext4/xattr.h            |   19 +++++
 fs/ext4/xattr_project_id.c |  169 ++++++++++++++++++++++++++++++++++++++++++++
 9 files changed, 238 insertions(+), 4 deletions(-)
 create mode 100644 fs/ext4/xattr_project_id.c

diff --git a/fs/ext4/Kconfig b/fs/ext4/Kconfig
index 9ed1bb1..1c04c9f 100644
--- a/fs/ext4/Kconfig
+++ b/fs/ext4/Kconfig
@@ -74,6 +74,14 @@ config EXT4_FS_SECURITY
 
 	  If you are not using a security module that requires using
 	  extended attributes for file security labels, say N.
+config EXT4_PROJECT_ID
+	bool "Ext4 project_id support"
+	depends on PROJECT_ID
+	depends on EXT4_FS_XATTR
+	help
+	  Enables project inode identifier support for ext4 filesystem.
+	  This feature allow to assign some id to inodes similar to
+	  uid/gid. 
 
 config EXT4_DEBUG
 	bool "EXT4 debugging support"
diff --git a/fs/ext4/Makefile b/fs/ext4/Makefile
index 8867b2a..04080cd 100644
--- a/fs/ext4/Makefile
+++ b/fs/ext4/Makefile
@@ -11,3 +11,4 @@ ext4-y	:= balloc.o bitmap.o dir.o file.o fsync.o ialloc.o inode.o \
 ext4-$(CONFIG_EXT4_FS_XATTR)		+= xattr.o xattr_user.o xattr_trusted.o
 ext4-$(CONFIG_EXT4_FS_POSIX_ACL)	+= acl.o
 ext4-$(CONFIG_EXT4_FS_SECURITY)		+= xattr_security.o
+ext4-$(CONFIG_EXT4_PROJECT_ID)		+= xattr_project_id.o
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index b2c01a2..bc5c919 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -719,6 +719,10 @@ struct ext4_inode_info {
 	 */
 	tid_t i_sync_tid;
 	tid_t i_datasync_tid;
+#ifdef CONFIG_EXT4_PROJECT_ID
+	/* project_id id, additional owner identifier similar to uid/gid */
+	unsigned int i_mid;
+#endif
 };
 
 /*
@@ -766,7 +770,9 @@ struct ext4_inode_info {
 #define EXT4_MOUNT_DELALLOC		0x8000000LL /* Delalloc support */
 #define EXT4_MOUNT_DATA_ERR_ABORT	0x10000000LL /* Abort on file data write */
 #define EXT4_MOUNT_BLOCK_VALIDITY	0x20000000LL /* Block validity checking */
-#define EXT4_MOUNT_DISCARD		0x40000000LL /* Issue DISCARD requests */
+#define EXT4_MOUNT_DISCARD		0x40000000LL /* Issue DISCARD requests 
+*/
+#define EXT4_MOUNT_PROJECT_ID		0x80000000LL /* extended owner id */
 
 #define clear_opt(o, opt)		o &= ~EXT4_MOUNT_##opt
 #define set_opt(o, opt)			o |= EXT4_MOUNT_##opt
diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
index f3624ea..ae88188 100644
--- a/fs/ext4/ialloc.c
+++ b/fs/ext4/ialloc.c
@@ -1032,7 +1032,10 @@ got:
 	ei->i_state = EXT4_STATE_NEW;
 
 	ei->i_extra_isize = EXT4_SB(sb)->s_want_extra_isize;
-
+#ifdef CONFIG_EXT4_PROJECT_ID
+	// XXX: move this to generic inode init helper
+	ei->i_mid = EXT4_I(dir)->i_mid;
+#endif
 	ret = inode;
 	if (vfs_dq_alloc_inode(inode)) {
 		err = -EDQUOT;
@@ -1046,6 +1049,9 @@ got:
 	err = ext4_init_security(handle, inode, dir);
 	if (err)
 		goto fail_free_drop;
+	err = ext4_prjid_write(handle, inode, ei->i_mid, XATTR_CREATE);
+	if (err)
+		goto fail_free_drop;
 
 	if (EXT4_HAS_INCOMPAT_FEATURE(sb, EXT4_FEATURE_INCOMPAT_EXTENTS)) {
 		/* set extent flag only for directory, file and normal symlink*/
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index e119524..59d5cf1 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -4936,7 +4936,18 @@ struct inode *ext4_iget(struct super_block *sb, unsigned long ino)
 	}
 	if (ret)
 		goto bad_inode;
-
+#ifdef CONFIG_EXT4_PROJECT_ID
+	if(test_opt(inode->i_sb, PROJECT_ID)) {
+		ret = ext4_prjid_read(inode, &ei->i_mid);
+		if (ret == -ENODATA) {
+			ei->i_mid = 0;
+			ret = 0;
+		}
+		if (ret)
+			goto bad_inode;
+	} else
+		ei->i_mid = 0;
+#endif
 	if (S_ISREG(inode->i_mode)) {
 		inode->i_op = &ext4_file_inode_operations;
 		inode->i_fop = &ext4_file_operations;
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 0b8fbab..12b7c2d 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -923,6 +923,9 @@ static int ext4_show_options(struct seq_file *seq, struct vfsmount *vfs)
 	if (test_opt(sb, DISCARD))
 		seq_puts(seq, ",discard");
 
+	if (test_opt(sb, PROJECT_ID))
+		seq_puts(seq, ",project_id");
+
 	if (test_opt(sb, NOLOAD))
 		seq_puts(seq, ",norecovery");
 
@@ -1113,7 +1116,7 @@ enum {
 	Opt_stripe, Opt_delalloc, Opt_nodelalloc,
 	Opt_block_validity, Opt_noblock_validity,
 	Opt_inode_readahead_blks, Opt_journal_ioprio,
-	Opt_discard, Opt_nodiscard,
+	Opt_discard, Opt_nodiscard, Opt_project_id,
 };
 
 static const match_table_t tokens = {
@@ -1182,6 +1185,7 @@ static const match_table_t tokens = {
 	{Opt_noauto_da_alloc, "noauto_da_alloc"},
 	{Opt_discard, "discard"},
 	{Opt_nodiscard, "nodiscard"},
+	{Opt_project_id, "project_id"},
 	{Opt_err, NULL},
 };
 
@@ -1613,6 +1617,9 @@ set_qf_format:
 		case Opt_nodiscard:
 			clear_opt(sbi->s_mount_opt, DISCARD);
 			break;
+		case Opt_project_id:
+			set_opt(sbi->s_mount_opt, PROJECT_ID);
+			break;
 		default:
 			ext4_msg(sb, KERN_ERR,
 			       "Unrecognized mount option \"%s\" "
diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c
index f3a2f7e..4dce406 100644
--- a/fs/ext4/xattr.c
+++ b/fs/ext4/xattr.c
@@ -107,6 +107,10 @@ static struct xattr_handler *ext4_xattr_handler_map[] = {
 #ifdef CONFIG_EXT4_FS_SECURITY
 	[EXT4_XATTR_INDEX_SECURITY]	     = &ext4_xattr_security_handler,
 #endif
+#ifdef CONFIG_EXT4_PROJECT_ID
+	[EXT4_XATTR_INDEX_PROJECT_ID]	     = &ext4_xattr_prjid_handler,
+#endif
+
 };
 
 struct xattr_handler *ext4_xattr_handlers[] = {
@@ -119,6 +123,9 @@ struct xattr_handler *ext4_xattr_handlers[] = {
 #ifdef CONFIG_EXT4_FS_SECURITY
 	&ext4_xattr_security_handler,
 #endif
+#ifdef CONFIG_EXT4_PROJECT_ID
+	&ext4_xattr_prjid_handler,
+#endif
 	NULL
 };
 
diff --git a/fs/ext4/xattr.h b/fs/ext4/xattr.h
index 8ede88b..f794719 100644
--- a/fs/ext4/xattr.h
+++ b/fs/ext4/xattr.h
@@ -21,6 +21,7 @@
 #define EXT4_XATTR_INDEX_TRUSTED		4
 #define	EXT4_XATTR_INDEX_LUSTRE			5
 #define EXT4_XATTR_INDEX_SECURITY	        6
+#define EXT4_XATTR_INDEX_PROJECT_ID	        7
 
 struct ext4_xattr_header {
 	__le32	h_magic;	/* magic number for identification */
@@ -70,6 +71,7 @@ extern struct xattr_handler ext4_xattr_trusted_handler;
 extern struct xattr_handler ext4_xattr_acl_access_handler;
 extern struct xattr_handler ext4_xattr_acl_default_handler;
 extern struct xattr_handler ext4_xattr_security_handler;
+extern struct xattr_handler ext4_xattr_prjid_handler;
 
 extern ssize_t ext4_listxattr(struct dentry *, char *, size_t);
 
@@ -153,3 +155,20 @@ static inline int ext4_init_security(handle_t *handle, struct inode *inode,
 	return 0;
 }
 #endif
+
+#ifdef CONFIG_EXT4_PROJECT_ID
+extern int ext4_prjid_read(struct inode *inode, unsigned int *mid);
+extern int ext4_prjid_write(handle_t *handle, struct inode *inode,
+				unsigned int mid, int xflags);
+#else
+inline int ext4_prjid_read(struct inode *inode, unsigned int *mid)
+{
+	return -ENOTSUPP;
+}
+inline int ext4_prjid_write(handle_t *handle, struct inode *inode,
+				unsigned int mid, int xflags)
+{
+	return -ENOTSUPP;
+}
+
+#endif
diff --git a/fs/ext4/xattr_project_id.c b/fs/ext4/xattr_project_id.c
new file mode 100644
index 0000000..1812e47
--- /dev/null
+++ b/fs/ext4/xattr_project_id.c
@@ -0,0 +1,169 @@
+/*
+ * linux/fs/ext4/xattr_project_id.c
+ *
+ * Copyright (C) 2010 Parallels Inc
+ * Dmitry Monakhov <dmonakhov@openvz.org>
+ */
+
+#include <linux/init.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+#include <linux/capability.h>
+#include <linux/fs.h>
+#include <linux/quotaops.h>
+#include "ext4_jbd2.h"
+#include "ext4.h"
+#include "xattr.h"
+
+/*
+ * Read project_id id from inode's xattr
+ * Locking: none
+ */
+int ext4_prjid_read(struct inode *inode, unsigned int *mid)
+{
+	__le32 dsk_mid;
+	int retval;
+	retval = ext4_xattr_get(inode, EXT4_XATTR_INDEX_PROJECT_ID, "",
+				&dsk_mid, sizeof (dsk_mid));
+	if (retval > 0) {
+		if (retval != sizeof(dsk_mid))
+			return -EIO;
+		else
+			retval = 0;
+	}
+	*mid = le32_to_cpu(dsk_mid);
+	return retval;
+
+}
+
+/*
+ * Save project_id id to inode's xattr
+ * Locking: none
+ */
+int ext4_prjid_write(handle_t *handle, struct inode *inode,
+				unsigned int mid, int xflags)
+{
+	__le32 dsk_mid = cpu_to_le32(mid);
+	int retval;
+	retval = ext4_xattr_set_handle(handle, inode, EXT4_XATTR_INDEX_PROJECT_ID, "",
+				&dsk_mid, sizeof (dsk_mid), xflags);
+	if (retval > 0) {
+		if (retval != sizeof(dsk_mid))
+			retval =  -EIO;
+		else
+			retval = 0;
+	}
+	return retval;
+}
+
+/*
+ * Change project_id id.
+ * Called under inode->i_mutex
+ */
+static int ext4_prjid_change(struct inode *inode, unsigned int new_mid)
+{
+	/*
+	 * One data_trans_blocks chunk for xattr update.
+	 * One quota_trans_blocks chunk for quota transfer, and one
+	 * quota_trans_block chunk for emergency quota rollback transfer,
+	 * because quota rollback may result new quota blocks allocation.
+	 */
+	unsigned credits = EXT4_DATA_TRANS_BLOCKS(inode->i_sb) +
+		EXT4_QUOTA_TRANS_BLOCKS(inode->i_sb) * 2;
+	qid_t qid[MAXQUOTAS];
+	int ret, ret2 = 0;
+	unsigned retries = 0;
+	handle_t *handle;
+
+	vfs_dq_init(inode);
+retry:
+	handle = ext4_journal_start(inode, credits);
+	if (IS_ERR(handle)) {
+		ret = PTR_ERR(handle);
+		ext4_std_error(inode->i_sb, ret);
+		goto out;
+	}
+	/* Inode may not have project_id xattr yet. Create it explicitly */
+	ret = ext4_prjid_write(handle, inode, EXT4_I(inode)->i_mid,
+			XATTR_CREATE);
+	if (ret == -EEXIST)
+		ret = 0;
+	if (ret) {
+		ret2 = ext4_journal_stop(handle);
+		if (ret2)
+			ret = ret2;
+		if (ret == -ENOSPC &&
+			ext4_should_retry_alloc(inode->i_sb, &retries))
+			goto retry;
+	}
+#ifdef CONFIG_QUOTA
+	qid[PRJQUOTA] = new_mid;
+	if (inode->i_sb->dq_op->transfer(inode, qid, 1 << PRJQUOTA))
+		ret = -EDQUOT;
+#endif
+	ret = ext4_prjid_write(handle, inode, new_mid, XATTR_REPLACE);
+	if (ret) {
+		/*
+		 * Function may fail only due to fatal error, Nor than less
+		 * we have try to rollback quota changes.
+		 */
+#ifdef CONFIG_QUOTA
+		qid[PRJQUOTA] = EXT4_I(inode)->i_mid;
+		if (inode->i_sb->dq_op->transfer(inode, qid, 1 << PRJQUOTA))
+			ret = -EDQUOT;
+#endif
+		ext4_std_error(inode->i_sb, ret);
+
+	}
+	EXT4_I(inode)->i_mid = new_mid;
+	ret2 = ext4_journal_stop(handle);
+out:
+	if (ret2)
+		ret = ret2;
+	return ret;
+}
+static size_t
+ext4_xattr_prjid_list(struct dentry *dentry, char *list, size_t list_size,
+		const char *name, size_t name_len, int type)
+{
+	if (list && XATTR_PRJID_LEN <= list_size)
+		memcpy(list, XATTR_PRJID, XATTR_PRJID_LEN);
+	return XATTR_PRJID_LEN;
+
+}
+
+static int
+ext4_xattr_prjid_get(struct dentry *dentry, const char *name,
+		       void *buffer, size_t size, int type)
+{
+	int ret;
+	unsigned mid;
+	char buf[32];
+	if (strcmp(name, "") != 0)
+		return -EINVAL;
+	ret = ext4_prjid_read(dentry->d_inode, &mid);
+	if (ret)
+		return ret;
+	snprintf(buf, sizeof(buf)-1, "%u", mid);
+	buf[31] = '\0';
+	strncpy(buffer, buf, size);
+	return strlen(buf);
+}
+
+static int
+ext4_xattr_prjid_set(struct dentry *dentry, const char *name,
+		const void *value, size_t size, int flags, int type)
+{
+	unsigned int new_mid;
+	if (strcmp(name, "") != 0)
+		return -EINVAL;
+	new_mid = simple_strtoul(value, (char **)&value, 0);
+	return ext4_prjid_change(dentry->d_inode, new_mid);
+}
+
+struct xattr_handler ext4_xattr_prjid_handler = {
+	.prefix	= XATTR_PRJID,
+	.list	= ext4_xattr_prjid_list,
+	.get	= ext4_xattr_prjid_get,
+	.set	= ext4_xattr_prjid_set,
+};
-- 
1.6.6


^ permalink raw reply related	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2010-02-20 10:58 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-02-18 16:45 [PATCH 0/6] RFC: introduce extended inode owner identifier v4 Dmitry Monakhov
2010-02-18 16:45 ` [PATCH 1/6] vfs: add per-sb auxiliary inode attribute table Dmitry Monakhov
2010-02-18 16:45   ` [PATCH 2/6] quota: switch reservation space management to aux_attribute Dmitry Monakhov
2010-02-18 16:45     ` [PATCH 3/6] vfs: Add additional owner identifier Dmitry Monakhov
2010-02-18 16:45       ` [PATCH 4/6] quota: Implement metagroup support for quota Dmitry Monakhov
2010-02-18 16:45         ` [PATCH 5/6] ext4: enlarge mount option field Dmitry Monakhov
2010-02-18 16:45           ` [PATCH 6/6] ext4: Implement metagroup support for ext4 filesystem Dmitry Monakhov
2010-02-18 19:00   ` [PATCH 1/6] vfs: add per-sb auxiliary inode attribute table Brad Boyer
2010-02-18 19:34     ` Dmitry Monakhov
2010-02-18 23:31 ` [PATCH 0/6] RFC: introduce extended inode owner identifier v4 Dave Chinner
2010-02-19 10:16   ` Dmitry Monakhov
2010-02-19 23:31     ` Dave Chinner
2010-02-20 10:58       ` Dmitry Monakhov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).