* [PATCH 0/6] RFC: introduce extended inode owner identifier v4 @ 2010-02-18 16:45 Dmitry Monakhov 2010-02-18 16:45 ` [PATCH 1/6] vfs: add per-sb auxiliary inode attribute table Dmitry Monakhov 2010-02-18 23:31 ` [PATCH 0/6] RFC: introduce extended inode owner identifier v4 Dave Chinner 0 siblings, 2 replies; 13+ messages in thread From: Dmitry Monakhov @ 2010-02-18 16:45 UTC (permalink / raw) To: linux-fsdevel; +Cc: Dmitry Monakhov This is new generation of attempt to add extended inode identifier. In previous posts it was called tree_id, subtree_id, project_id. But after none of this was not good enough. I've refused project_id because it is well know XFS feature. And my implementation is slightly different from it especially from user-space point of view. In order to avoid ambiguity i've stopped at the "metagroup" term. I hope it is final name for the feature. *Feature description* 1) Inode may has a metagroup identifier which has same meaning as uid/gid. 2) Id is stored in inode's xattr named "system.metagroup" 3) Id is inherent from parent inode on creation. 4) This id is cached in memory inode structure(inside fsprivate_inode) and is accessible from vfs layer. 5) Since id is cached in memory it may be used for different purposes such as: 5A) Implement additional quota id space ortohonal to uid/gid. This is useful in managing quota for some filesystem hierarchy(chroot or container over bindmount) 5B) Export dedicated fs hierarchy to nfsd (only inode which has some metagroup will be accessible via nfsd) *Implementation details* It is unlikely that everybody will be happy to have new field in vfs_inode(which is not widely used). What's why this field is stored inside private_inode. But we have to have an access to this private_field. First time similar issue was resolved while implementing generic quota reserved_space management interface. Jan suggested to implement some sort auxiliary inode attributes map. And access non standard inode attributes via this aux_attr_map. I've implemented this idea in form of per-sb aux_attribute table. (Macros is not good here because different attributes may have different types which result in massive typecasting). If someone have better ideas please say you word. In order to give an overview of this interface i've converted quota's reserved space interface to that new aux_attr_table. After we have generic interface for auxiliary attributes each filesystem may implement metagroup support in it's own meaner. This should be done in following steps: 1) Add field to private_inode, and export it via aux_attribute 2) Implement id inheritance on inode creation 3) Implement handler for "system.metagroup" xattr. This patch contains an example implementation of this for ext4. The patch-set is compile tested only. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> --- Makefile | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/Makefile b/Makefile index 12b1aa1..c9aef25 100644 --- a/Makefile +++ b/Makefile @@ -1,7 +1,7 @@ VERSION = 2 PATCHLEVEL = 6 SUBLEVEL = 33 -EXTRAVERSION = -rc8 +EXTRAVERSION = -rc8-metagroup NAME = Man-Eating Seals of Antiquity # *DOCUMENTATION* -- 1.6.6 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 1/6] vfs: add per-sb auxiliary inode attribute table 2010-02-18 16:45 [PATCH 0/6] RFC: introduce extended inode owner identifier v4 Dmitry Monakhov @ 2010-02-18 16:45 ` Dmitry Monakhov 2010-02-18 16:45 ` [PATCH 2/6] quota: switch reservation space management to aux_attribute Dmitry Monakhov 2010-02-18 19:00 ` [PATCH 1/6] vfs: add per-sb auxiliary inode attribute table Brad Boyer 2010-02-18 23:31 ` [PATCH 0/6] RFC: introduce extended inode owner identifier v4 Dave Chinner 1 sibling, 2 replies; 13+ messages in thread From: Dmitry Monakhov @ 2010-02-18 16:45 UTC (permalink / raw) To: linux-fsdevel; +Cc: Dmitry Monakhov Some times it is useful to export non standard attributes to generic vfs layer, but it is too expansive to store it inside vfs inode. Let's introduce generic interface for this purpose. One may declare an attribute and filesystem provides access to it, if necessery. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> --- include/linux/fs.h | 7 ++++++- 1 files changed, 6 insertions(+), 1 deletions(-) diff --git a/include/linux/fs.h b/include/linux/fs.h index b1bcb27..c510ef7 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -384,6 +384,7 @@ struct inodes_stat_t { #include <asm/byteorder.h> struct export_operations; +struct aux_attributes; struct hd_geometry; struct iovec; struct nameidata; @@ -1323,6 +1324,7 @@ struct super_block { const struct dquot_operations *dq_op; const struct quotactl_ops *s_qcop; const struct export_operations *s_export_op; + const struct aux_attributes *s_aux_attr; unsigned long s_flags; unsigned long s_magic; struct dentry *s_root; @@ -1576,7 +1578,10 @@ struct super_operations { #endif int (*bdev_try_to_free_page)(struct super_block*, struct page*, gfp_t); }; - +struct aux_attributes +{ + int supported; +}; /* * Inode state bits. Protected by inode_lock. * -- 1.6.6 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 2/6] quota: switch reservation space management to aux_attribute 2010-02-18 16:45 ` [PATCH 1/6] vfs: add per-sb auxiliary inode attribute table Dmitry Monakhov @ 2010-02-18 16:45 ` Dmitry Monakhov 2010-02-18 16:45 ` [PATCH 3/6] vfs: Add additional owner identifier Dmitry Monakhov 2010-02-18 19:00 ` [PATCH 1/6] vfs: add per-sb auxiliary inode attribute table Brad Boyer 1 sibling, 1 reply; 13+ messages in thread From: Dmitry Monakhov @ 2010-02-18 16:45 UTC (permalink / raw) To: linux-fsdevel; +Cc: Dmitry Monakhov Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> --- fs/ext4/super.c | 11 +++++++---- fs/quota/dquot.c | 7 ++++--- include/linux/fs.h | 5 +++++ include/linux/quota.h | 3 --- 4 files changed, 16 insertions(+), 10 deletions(-) diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 735c20d..84a51d9 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -1018,9 +1018,6 @@ static const struct dquot_operations ext4_quota_operations = { .reserve_space = dquot_reserve_space, .claim_space = dquot_claim_space, .release_rsv = dquot_release_reserved_space, -#ifdef CONFIG_QUOTA - .get_reserved_space = ext4_get_reserved_space, -#endif .alloc_inode = dquot_alloc_inode, .free_space = dquot_free_space, .free_inode = dquot_free_inode, @@ -1033,7 +1030,13 @@ static const struct dquot_operations ext4_quota_operations = { .alloc_dquot = dquot_alloc, .destroy_dquot = dquot_destroy, }; - +static const struct aux_attributes ext4_aux_attr = +{ + .supported = 1, +#ifdef CONFIG_QUOTA + .reserved_space = ext4_get_reserved_space, +#endif +}; static const struct quotactl_ops ext4_qctl_operations = { .quota_on = ext4_quota_on, .quota_off = vfs_quota_off, diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c index 4d2041f..de4b8fc 100644 --- a/fs/quota/dquot.c +++ b/fs/quota/dquot.c @@ -1405,8 +1405,8 @@ static qsize_t *inode_reserved_space(struct inode * inode) { /* Filesystem must explicitly define it's own method in order to use * quota reservation interface */ - BUG_ON(!inode->i_sb->dq_op->get_reserved_space); - return inode->i_sb->dq_op->get_reserved_space(inode); + BUG_ON(!inode->i_sb->s_aux_attr->reserved_space); + return inode->i_sb->s_aux_attr->reserved_space(inode); } void inode_add_rsv_space(struct inode *inode, qsize_t number) @@ -1438,7 +1438,8 @@ static qsize_t inode_get_rsv_space(struct inode *inode) { qsize_t ret; - if (!inode->i_sb->dq_op->get_reserved_space) + if (!inode->i_sb->s_aux_attr || + !inode->i_sb->s_aux_attr->reserved_space) return 0; spin_lock(&inode->i_lock); ret = *inode_reserved_space(inode); diff --git a/include/linux/fs.h b/include/linux/fs.h index c510ef7..0cd0105 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1581,6 +1581,11 @@ struct super_operations { struct aux_attributes { int supported; +#ifdef CONFIG_QUOTA + /* Delay allocation space reservation managed internally by quota, + * and protected by i_lock similar to i_blocks+i_bytes. */ + qsize_t* (*reserved_space)(struct inode *inode); +#endif }; /* * Inode state bits. Protected by inode_lock. diff --git a/include/linux/quota.h b/include/linux/quota.h index edf34f2..680605d 100644 --- a/include/linux/quota.h +++ b/include/linux/quota.h @@ -315,9 +315,6 @@ struct dquot_operations { int (*claim_space) (struct inode *, qsize_t); /* release rsved quota for delayed alloc */ void (*release_rsv) (struct inode *, qsize_t); - /* get reserved quota for delayed alloc, value returned is managed by - * quota code only */ - qsize_t *(*get_reserved_space) (struct inode *); }; /* Operations handling requests from userspace */ -- 1.6.6 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 3/6] vfs: Add additional owner identifier 2010-02-18 16:45 ` [PATCH 2/6] quota: switch reservation space management to aux_attribute Dmitry Monakhov @ 2010-02-18 16:45 ` Dmitry Monakhov 2010-02-18 16:45 ` [PATCH 4/6] quota: Implement metagroup support for quota Dmitry Monakhov 0 siblings, 1 reply; 13+ messages in thread From: Dmitry Monakhov @ 2010-02-18 16:45 UTC (permalink / raw) To: linux-fsdevel; +Cc: Dmitry Monakhov Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> --- fs/Kconfig | 7 +++++++ include/linux/fs.h | 5 +++++ include/linux/xattr.h | 3 +++ 3 files changed, 15 insertions(+), 0 deletions(-) diff --git a/fs/Kconfig b/fs/Kconfig index 64d44ef..ad47589 100644 --- a/fs/Kconfig +++ b/fs/Kconfig @@ -54,6 +54,13 @@ config FILE_LOCKING This option enables standard file locking support, required for filesystems like NFS and for the flock() system call. Disabling this option saves about 11k. +config METAGROUP + bool "Enable metagroup inode identifier" + default y + help + This option enables metagroup inode identifier. Metagroup + may be used as auxiliary owner specifier in addition to + standard uid/gid. source "fs/notify/Kconfig" diff --git a/include/linux/fs.h b/include/linux/fs.h index 0cd0105..f1139ed 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1586,6 +1586,11 @@ struct aux_attributes * and protected by i_lock similar to i_blocks+i_bytes. */ qsize_t* (*reserved_space)(struct inode *inode); #endif +#ifdef CONFIG_METAGROUP + /* Metagroup id, protected by i_mutex similar to i_uid/i_gid*/ + uid_t* (*metagroup)(struct inode *inode); +#endif + }; /* * Inode state bits. Protected by inode_lock. diff --git a/include/linux/xattr.h b/include/linux/xattr.h index fb9b7e6..efd9ed1 100644 --- a/include/linux/xattr.h +++ b/include/linux/xattr.h @@ -33,6 +33,9 @@ #define XATTR_USER_PREFIX "user." #define XATTR_USER_PREFIX_LEN (sizeof (XATTR_USER_PREFIX) - 1) +#define XATTR_METAGROUP "system.metagroup" +#define XATTR_METAGROUP_LEN (sizeof (XATTR_METAGROUP)) + struct inode; struct dentry; -- 1.6.6 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 4/6] quota: Implement metagroup support for quota 2010-02-18 16:45 ` [PATCH 3/6] vfs: Add additional owner identifier Dmitry Monakhov @ 2010-02-18 16:45 ` Dmitry Monakhov 2010-02-18 16:45 ` [PATCH 5/6] ext4: enlarge mount option field Dmitry Monakhov 0 siblings, 1 reply; 13+ messages in thread From: Dmitry Monakhov @ 2010-02-18 16:45 UTC (permalink / raw) To: linux-fsdevel; +Cc: Dmitry Monakhov Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> --- fs/quota/dquot.c | 12 ++++++++++++ fs/quota/quotaio_v2.h | 6 ++++-- include/linux/quota.h | 12 +++++++++++- 3 files changed, 27 insertions(+), 3 deletions(-) diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c index de4b8fc..40075ea 100644 --- a/fs/quota/dquot.c +++ b/fs/quota/dquot.c @@ -1090,6 +1090,11 @@ static int need_print_warning(struct dquot *dquot) return current_fsuid() == dquot->dq_id; case GRPQUOTA: return in_group_p(dquot->dq_id); + case MGRQUOTA: + /* XXX: Currently there is no way to understand + which metagroup this task belonges to, So print + a warn message unconditionally. -dmon */ + return 1; } return 0; } @@ -1322,6 +1327,13 @@ int dquot_initialize(struct inode *inode, int type) case GRPQUOTA: id = inode->i_gid; break; + case MGRQUOTA: + if (inode->i_sb->s_aux_attr && + inode->i_sb->s_aux_attr->metagroup) + id = *inode->i_sb->s_aux_attr->metagroup(inode); + else + BUG_ON(sb_has_quota_loaded(inode->i_sb, MGRQUOTA)); + break; } got[cnt] = dqget(sb, id, cnt); } diff --git a/fs/quota/quotaio_v2.h b/fs/quota/quotaio_v2.h index f1966b4..c65c7fc 100644 --- a/fs/quota/quotaio_v2.h +++ b/fs/quota/quotaio_v2.h @@ -13,12 +13,14 @@ */ #define V2_INITQMAGICS {\ 0xd9c01f11, /* USRQUOTA */\ - 0xd9c01927 /* GRPQUOTA */\ + 0xd9c01927, /* GRPQUOTA */\ + 0xd9c03f14 /* MRGQUOTA */\ } #define V2_INITQVERSIONS {\ 1, /* USRQUOTA */\ - 1 /* GRPQUOTA */\ + 1, /* GRPQUOTA */ \ + 1 /* MGRQUOTA */\ } /* First generic header */ diff --git a/include/linux/quota.h b/include/linux/quota.h index 680605d..a8f6cbe 100644 --- a/include/linux/quota.h +++ b/include/linux/quota.h @@ -36,18 +36,28 @@ #include <linux/errno.h> #include <linux/types.h> -#define __DQUOT_VERSION__ "dquot_6.5.2" +#define __DQUOT_VERSION__ "dquot_6.6.0" +#ifdef CONFIG_METAGROUP +#define MAXQUOTAS 3 +#else #define MAXQUOTAS 2 +#endif + #define USRQUOTA 0 /* element used for user quotas */ #define GRPQUOTA 1 /* element used for group quotas */ +#ifdef CONFIG_METAGROUP +#define MGRQUOTA 2 /* element used for metagroup quotas */ +#endif + /* * Definitions for the default names of the quotas files. */ #define INITQFNAMES { \ "user", /* USRQUOTA */ \ "group", /* GRPQUOTA */ \ + "metagroup", /* MGRQUOTA */ \ "undefined", \ }; -- 1.6.6 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 5/6] ext4: enlarge mount option field 2010-02-18 16:45 ` [PATCH 4/6] quota: Implement metagroup support for quota Dmitry Monakhov @ 2010-02-18 16:45 ` Dmitry Monakhov 2010-02-18 16:45 ` [PATCH 6/6] ext4: Implement metagroup support for ext4 filesystem Dmitry Monakhov 0 siblings, 1 reply; 13+ messages in thread From: Dmitry Monakhov @ 2010-02-18 16:45 UTC (permalink / raw) To: linux-fsdevel; +Cc: Dmitry Monakhov Currently only one bit left in s_mount_opt. Let's double size it for future purposes. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> --- fs/ext4/ext4.h | 62 +++++++++++++++++++++++++------------------------- fs/ext4/ext4_jbd2.c | 2 +- fs/ext4/super.c | 2 +- 3 files changed, 33 insertions(+), 33 deletions(-) diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index 874d169..b2c01a2 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -421,7 +421,7 @@ struct ext4_new_group_data { * Mount options */ struct ext4_mount_options { - unsigned long s_mount_opt; + unsigned long long s_mount_opt; uid_t s_resuid; gid_t s_resgid; unsigned long s_commit_interval; @@ -738,35 +738,35 @@ struct ext4_inode_info { /* * Mount flags */ -#define EXT4_MOUNT_OLDALLOC 0x00002 /* Don't use the new Orlov allocator */ -#define EXT4_MOUNT_GRPID 0x00004 /* Create files with directory's group */ -#define EXT4_MOUNT_DEBUG 0x00008 /* Some debugging messages */ -#define EXT4_MOUNT_ERRORS_CONT 0x00010 /* Continue on errors */ -#define EXT4_MOUNT_ERRORS_RO 0x00020 /* Remount fs ro on errors */ -#define EXT4_MOUNT_ERRORS_PANIC 0x00040 /* Panic on errors */ -#define EXT4_MOUNT_MINIX_DF 0x00080 /* Mimics the Minix statfs */ -#define EXT4_MOUNT_NOLOAD 0x00100 /* Don't use existing journal*/ -#define EXT4_MOUNT_DATA_FLAGS 0x00C00 /* Mode for data writes: */ -#define EXT4_MOUNT_JOURNAL_DATA 0x00400 /* Write data to journal */ -#define EXT4_MOUNT_ORDERED_DATA 0x00800 /* Flush data before commit */ -#define EXT4_MOUNT_WRITEBACK_DATA 0x00C00 /* No data ordering */ -#define EXT4_MOUNT_UPDATE_JOURNAL 0x01000 /* Update the journal format */ -#define EXT4_MOUNT_NO_UID32 0x02000 /* Disable 32-bit UIDs */ -#define EXT4_MOUNT_XATTR_USER 0x04000 /* Extended user attributes */ -#define EXT4_MOUNT_POSIX_ACL 0x08000 /* POSIX Access Control Lists */ -#define EXT4_MOUNT_NO_AUTO_DA_ALLOC 0x10000 /* No auto delalloc mapping */ -#define EXT4_MOUNT_BARRIER 0x20000 /* Use block barriers */ -#define EXT4_MOUNT_NOBH 0x40000 /* No bufferheads */ -#define EXT4_MOUNT_QUOTA 0x80000 /* Some quota option set */ -#define EXT4_MOUNT_USRQUOTA 0x100000 /* "old" user quota */ -#define EXT4_MOUNT_GRPQUOTA 0x200000 /* "old" group quota */ -#define EXT4_MOUNT_JOURNAL_CHECKSUM 0x800000 /* Journal checksums */ -#define EXT4_MOUNT_JOURNAL_ASYNC_COMMIT 0x1000000 /* Journal Async Commit */ -#define EXT4_MOUNT_I_VERSION 0x2000000 /* i_version support */ -#define EXT4_MOUNT_DELALLOC 0x8000000 /* Delalloc support */ -#define EXT4_MOUNT_DATA_ERR_ABORT 0x10000000 /* Abort on file data write */ -#define EXT4_MOUNT_BLOCK_VALIDITY 0x20000000 /* Block validity checking */ -#define EXT4_MOUNT_DISCARD 0x40000000 /* Issue DISCARD requests */ +#define EXT4_MOUNT_OLDALLOC 0x00002LL /* Don't use the new Orlov allocator */ +#define EXT4_MOUNT_GRPID 0x00004LL /* Create files with directory's group */ +#define EXT4_MOUNT_DEBUG 0x00008LL /* Some debugging messages */ +#define EXT4_MOUNT_ERRORS_CONT 0x00010LL /* Continue on errors */ +#define EXT4_MOUNT_ERRORS_RO 0x00020LL /* Remount fs ro on errors */ +#define EXT4_MOUNT_ERRORS_PANIC 0x00040LL /* Panic on errors */ +#define EXT4_MOUNT_MINIX_DF 0x00080LL /* Mimics the Minix statfs */ +#define EXT4_MOUNT_NOLOAD 0x00100LL /* Don't use existing journal*/ +#define EXT4_MOUNT_DATA_FLAGS 0x00C00LL /* Mode for data writes: */ +#define EXT4_MOUNT_JOURNAL_DATA 0x00400LL /* Write data to journal */ +#define EXT4_MOUNT_ORDERED_DATA 0x00800LL /* Flush data before commit */ +#define EXT4_MOUNT_WRITEBACK_DATA 0x00C00LL /* No data ordering */ +#define EXT4_MOUNT_UPDATE_JOURNAL 0x01000LL /* Update the journal format */ +#define EXT4_MOUNT_NO_UID32 0x02000LL /* Disable 32-bit UIDs */ +#define EXT4_MOUNT_XATTR_USER 0x04000LL /* Extended user attributes */ +#define EXT4_MOUNT_POSIX_ACL 0x08000LL /* POSIX Access Control Lists */ +#define EXT4_MOUNT_NO_AUTO_DA_ALLOC 0x10000LL /* No auto delalloc mapping */ +#define EXT4_MOUNT_BARRIER 0x20000LL /* Use block barriers */ +#define EXT4_MOUNT_NOBH 0x40000LL /* No bufferheads */ +#define EXT4_MOUNT_QUOTA 0x80000LL /* Some quota option set */ +#define EXT4_MOUNT_USRQUOTA 0x100000LL /* "old" user quota */ +#define EXT4_MOUNT_GRPQUOTA 0x200000LL /* "old" group quota */ +#define EXT4_MOUNT_JOURNAL_CHECKSUM 0x800000LL /* Journal checksums */ +#define EXT4_MOUNT_JOURNAL_ASYNC_COMMIT 0x1000000LL /* Journal Async Commit */ +#define EXT4_MOUNT_I_VERSION 0x2000000LL /* i_version support */ +#define EXT4_MOUNT_DELALLOC 0x8000000LL /* Delalloc support */ +#define EXT4_MOUNT_DATA_ERR_ABORT 0x10000000LL /* Abort on file data write */ +#define EXT4_MOUNT_BLOCK_VALIDITY 0x20000000LL /* Block validity checking */ +#define EXT4_MOUNT_DISCARD 0x40000000LL /* Issue DISCARD requests */ #define clear_opt(o, opt) o &= ~EXT4_MOUNT_##opt #define set_opt(o, opt) o |= EXT4_MOUNT_##opt @@ -915,7 +915,7 @@ struct ext4_sb_info { struct buffer_head * s_sbh; /* Buffer containing the super block */ struct ext4_super_block *s_es; /* Pointer to the super block in the buffer */ struct buffer_head **s_group_desc; - unsigned int s_mount_opt; + unsigned long long s_mount_opt; unsigned int s_mount_flags; ext4_fsblk_t s_sb_block; uid_t s_resuid; diff --git a/fs/ext4/ext4_jbd2.c b/fs/ext4/ext4_jbd2.c index b57e5c7..36e1f98 100644 --- a/fs/ext4/ext4_jbd2.c +++ b/fs/ext4/ext4_jbd2.c @@ -58,7 +58,7 @@ int __ext4_forget(const char *where, handle_t *handle, int is_metadata, BUFFER_TRACE(bh, "enter"); jbd_debug(4, "forgetting bh %p: is_metadata = %d, mode %o, " - "data mode %x\n", + "data mode %llx\n", bh, is_metadata, inode->i_mode, test_opt(inode->i_sb, DATA_FLAGS)); diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 84a51d9..80d6c14 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -1700,7 +1700,7 @@ static int ext4_setup_super(struct super_block *sb, struct ext4_super_block *es, ext4_commit_super(sb, 1); if (test_opt(sb, DEBUG)) printk(KERN_INFO "[EXT4 FS bs=%lu, gc=%u, " - "bpg=%lu, ipg=%lu, mo=%04x]\n", + "bpg=%lu, ipg=%lu, mo=%08llx]\n", sb->s_blocksize, sbi->s_groups_count, EXT4_BLOCKS_PER_GROUP(sb), -- 1.6.6 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 6/6] ext4: Implement metagroup support for ext4 filesystem 2010-02-18 16:45 ` [PATCH 5/6] ext4: enlarge mount option field Dmitry Monakhov @ 2010-02-18 16:45 ` Dmitry Monakhov 0 siblings, 0 replies; 13+ messages in thread From: Dmitry Monakhov @ 2010-02-18 16:45 UTC (permalink / raw) To: linux-fsdevel; +Cc: Dmitry Monakhov Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> --- fs/ext4/Kconfig | 8 +++ fs/ext4/Makefile | 1 + fs/ext4/ext4.h | 8 ++- fs/ext4/ialloc.c | 5 +- fs/ext4/inode.c | 13 ++++- fs/ext4/super.c | 9 +++- fs/ext4/xattr.c | 7 ++ fs/ext4/xattr.h | 11 +++ fs/ext4/xattr_metagroup.c | 153 +++++++++++++++++++++++++++++++++++++++++++++ 9 files changed, 211 insertions(+), 4 deletions(-) create mode 100644 fs/ext4/xattr_metagroup.c diff --git a/fs/ext4/Kconfig b/fs/ext4/Kconfig index 9ed1bb1..e3365db 100644 --- a/fs/ext4/Kconfig +++ b/fs/ext4/Kconfig @@ -74,6 +74,14 @@ config EXT4_FS_SECURITY If you are not using a security module that requires using extended attributes for file security labels, say N. +config EXT4_METAGROUP + bool "Ext4 metagroup support" + depends on METAGROUP + depends on EXT4_FS_XATTR + help + Enables metagroup inode identifier support for ext4 filesystem. + This feature allow to assign some id to inodes similar to + uid/gid. config EXT4_DEBUG bool "EXT4 debugging support" diff --git a/fs/ext4/Makefile b/fs/ext4/Makefile index 8867b2a..62f75b8 100644 --- a/fs/ext4/Makefile +++ b/fs/ext4/Makefile @@ -11,3 +11,4 @@ ext4-y := balloc.o bitmap.o dir.o file.o fsync.o ialloc.o inode.o \ ext4-$(CONFIG_EXT4_FS_XATTR) += xattr.o xattr_user.o xattr_trusted.o ext4-$(CONFIG_EXT4_FS_POSIX_ACL) += acl.o ext4-$(CONFIG_EXT4_FS_SECURITY) += xattr_security.o +ext4-$(CONFIG_EXT4_METAGROUP) += xattr_metagroup.o diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index b2c01a2..c3f95e7 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -719,6 +719,10 @@ struct ext4_inode_info { */ tid_t i_sync_tid; tid_t i_datasync_tid; +#ifdef CONFIG_EXT4_METAGROUP + /* metagroup id, additional owner identifier similar to uid/gid */ + unsigned int i_mid; +#endif }; /* @@ -766,7 +770,9 @@ struct ext4_inode_info { #define EXT4_MOUNT_DELALLOC 0x8000000LL /* Delalloc support */ #define EXT4_MOUNT_DATA_ERR_ABORT 0x10000000LL /* Abort on file data write */ #define EXT4_MOUNT_BLOCK_VALIDITY 0x20000000LL /* Block validity checking */ -#define EXT4_MOUNT_DISCARD 0x40000000LL /* Issue DISCARD requests */ +#define EXT4_MOUNT_DISCARD 0x40000000LL /* Issue DISCARD requests +*/ +#define EXT4_MOUNT_METAGROUP 0x80000000LL /* extended owner id */ #define clear_opt(o, opt) o &= ~EXT4_MOUNT_##opt #define set_opt(o, opt) o |= EXT4_MOUNT_##opt diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c index f3624ea..535b905 100644 --- a/fs/ext4/ialloc.c +++ b/fs/ext4/ialloc.c @@ -1032,7 +1032,10 @@ got: ei->i_state = EXT4_STATE_NEW; ei->i_extra_isize = EXT4_SB(sb)->s_want_extra_isize; - +#ifdef CONFIG_EXT4_METAGROUP + // XXX: move this to generic inode init helper + EXT4_I(inode)->i_mid = EXT4_I(dir)->i_mid; +#endif ret = inode; if (vfs_dq_alloc_inode(inode)) { err = -EDQUOT; diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index e119524..b1b5fdc 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -4936,7 +4936,18 @@ struct inode *ext4_iget(struct super_block *sb, unsigned long ino) } if (ret) goto bad_inode; - +#ifdef CONFIG_EXT4_METAGROUP + if(test_opt(inode->i_sb, METAGROUP)) { + ret = ext4_metagroup_read(inode, &ei->i_mid); + if (ret == -ENODATA) { + ei->i_mid = 0; + ret = 0; + } + if (ret) + goto bad_inode; + } else + ei->i_mid = 0; +#endif if (S_ISREG(inode->i_mode)) { inode->i_op = &ext4_file_inode_operations; inode->i_fop = &ext4_file_operations; diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 80d6c14..cb169f8 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -923,6 +923,9 @@ static int ext4_show_options(struct seq_file *seq, struct vfsmount *vfs) if (test_opt(sb, DISCARD)) seq_puts(seq, ",discard"); + if (test_opt(sb, METAGROUP)) + seq_puts(seq, ",metagroup"); + if (test_opt(sb, NOLOAD)) seq_puts(seq, ",norecovery"); @@ -1112,7 +1115,7 @@ enum { Opt_stripe, Opt_delalloc, Opt_nodelalloc, Opt_block_validity, Opt_noblock_validity, Opt_inode_readahead_blks, Opt_journal_ioprio, - Opt_discard, Opt_nodiscard, + Opt_discard, Opt_nodiscard, Opt_metagroup, }; static const match_table_t tokens = { @@ -1181,6 +1184,7 @@ static const match_table_t tokens = { {Opt_noauto_da_alloc, "noauto_da_alloc"}, {Opt_discard, "discard"}, {Opt_nodiscard, "nodiscard"}, + {Opt_metagroup, "metagroup"}, {Opt_err, NULL}, }; @@ -1612,6 +1616,9 @@ set_qf_format: case Opt_nodiscard: clear_opt(sbi->s_mount_opt, DISCARD); break; + case Opt_metagroup: + set_opt(sbi->s_mount_opt, METAGROUP); + break; default: ext4_msg(sb, KERN_ERR, "Unrecognized mount option \"%s\" " diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c index f3a2f7e..a97294b 100644 --- a/fs/ext4/xattr.c +++ b/fs/ext4/xattr.c @@ -107,6 +107,10 @@ static struct xattr_handler *ext4_xattr_handler_map[] = { #ifdef CONFIG_EXT4_FS_SECURITY [EXT4_XATTR_INDEX_SECURITY] = &ext4_xattr_security_handler, #endif +#ifdef CONFIG_EXT4_METAGROUP + [EXT4_XATTR_INDEX_METAGROUP] = &ext4_xattr_metagroup_handler, +#endif + }; struct xattr_handler *ext4_xattr_handlers[] = { @@ -119,6 +123,9 @@ struct xattr_handler *ext4_xattr_handlers[] = { #ifdef CONFIG_EXT4_FS_SECURITY &ext4_xattr_security_handler, #endif +#ifdef CONFIG_EXT4_METAGROUP + &ext4_xattr_metagroup_handler, +#endif NULL }; diff --git a/fs/ext4/xattr.h b/fs/ext4/xattr.h index 8ede88b..46b8369 100644 --- a/fs/ext4/xattr.h +++ b/fs/ext4/xattr.h @@ -21,6 +21,7 @@ #define EXT4_XATTR_INDEX_TRUSTED 4 #define EXT4_XATTR_INDEX_LUSTRE 5 #define EXT4_XATTR_INDEX_SECURITY 6 +#define EXT4_XATTR_INDEX_METAGROUP 7 struct ext4_xattr_header { __le32 h_magic; /* magic number for identification */ @@ -70,6 +71,7 @@ extern struct xattr_handler ext4_xattr_trusted_handler; extern struct xattr_handler ext4_xattr_acl_access_handler; extern struct xattr_handler ext4_xattr_acl_default_handler; extern struct xattr_handler ext4_xattr_security_handler; +extern struct xattr_handler ext4_xattr_metagroup_handler; extern ssize_t ext4_listxattr(struct dentry *, char *, size_t); @@ -153,3 +155,12 @@ static inline int ext4_init_security(handle_t *handle, struct inode *inode, return 0; } #endif + +#ifdef CONFIG_EXT4_METAGROUP +extern int ext4_metagroup_read(struct inode *inode, unsigned int *mid); +#else +inline int ext4_metagroup_read(struct inode *inode, unsigned int *mid) +{ + return -ENOTSUPP; +} +#endif diff --git a/fs/ext4/xattr_metagroup.c b/fs/ext4/xattr_metagroup.c new file mode 100644 index 0000000..5585d4d --- /dev/null +++ b/fs/ext4/xattr_metagroup.c @@ -0,0 +1,153 @@ +/* + * linux/fs/ext4/xattr_metagroup.c + * + * Copyright (C) 2010 Parallels Inc + * Dmitry Monakhov <dmonakhov@openvz.org> + */ + +#include <linux/init.h> +#include <linux/sched.h> +#include <linux/slab.h> +#include <linux/capability.h> +#include <linux/fs.h> +#include <linux/quotaops.h> +#include "ext4_jbd2.h" +#include "ext4.h" +#include "xattr.h" + +/* + * Read metagroup id from inode's xattr + * Locking: none + */ +int ext4_metagroup_read(struct inode *inode, unsigned int *mid) +{ + __le32 dsk_mid; + int retval; + retval = ext4_xattr_get(inode, EXT4_XATTR_INDEX_METAGROUP, "", + &dsk_mid, sizeof (dsk_mid)); + if (retval > 0 && retval != sizeof(dsk_mid)) + return -EIO; + *mid = le32_to_cpu(dsk_mid); + return retval; + +} + +/* + * Save metagroup id to inode's xattr + * Locking: none + */ +static int ext4_metagroup_write(handle_t *handle, struct inode *inode, + unsigned int mid, int xflags) +{ + __le32 dsk_mid; + int retval; + retval = ext4_xattr_set_handle(handle, inode, EXT4_XATTR_INDEX_METAGROUP, "", + &dsk_mid, sizeof (dsk_mid), xflags); + if (retval > 0 && retval != sizeof(dsk_mid)) + return -EIO; + return retval; +} + +/* + * Change metagroup id. + * Called under inode->i_mutex + */ +static int ext4_metagroup_change(struct inode *inode, unsigned int new_mid) +{ + /* + * One data_trans_blocks chunk for xattr update. + * One quota_trans_blocks chunk for quota transfer, and one + * quota_trans_block chunk for emergency quota rollback transfer, + * because quota rollback may result new quota blocks allocation. + */ + unsigned credits = EXT4_DATA_TRANS_BLOCKS(inode->i_sb) + + EXT4_QUOTA_TRANS_BLOCKS(inode->i_sb) * 2; + qid_t qid[MAXQUOTAS]; + int ret, ret2 = 0; + unsigned retries = 0; + handle_t *handle; + + vfs_dq_init(inode); +retry: + handle = ext4_journal_start(inode, credits); + if (IS_ERR(handle)) { + ret = PTR_ERR(handle); + ext4_std_error(inode->i_sb, ret); + goto out; + } + /* Inode may not have metagroup xattr yet. Create it explicitly */ + ret = ext4_metagroup_write(handle, inode, EXT4_I(inode)->i_mid, + XATTR_CREATE); + if (ret == -EEXIST) + ret = 0; + if (ret) { + ret2 = ext4_journal_stop(handle); + if (ret2) + ret = ret2; + if (ret == -ENOSPC && + ext4_should_retry_alloc(inode->i_sb, &retries)) + goto retry; + } +#ifdef CONFIG_QUOTA + qid[MGRQUOTA] = new_mid; + if (inode->i_sb->dq_op->transfer(inode, qid, 1 << MGRQUOTA)) + ret = -EDQUOT; +#endif + ret = ext4_metagroup_write(handle, inode, new_mid, XATTR_REPLACE); + if (ret) { + /* + * Function may fail only due to fatal error, Nor than less + * we have try to rollback quota changes. + */ +#ifdef CONFIG_QUOTA + qid[MGRQUOTA] = EXT4_I(inode)->i_mid; + if (inode->i_sb->dq_op->transfer(inode, qid, 1 << MGRQUOTA)) + ret = -EDQUOT; +#endif + ext4_std_error(inode->i_sb, ret); + + } + EXT4_I(inode)->i_mid = new_mid; + ret2 = ext4_journal_stop(handle); +out: + if (ret2) + ret = ret2; + return ret; +} +static size_t +ext4_xattr_metagroup_list(struct dentry *dentry, char *list, size_t list_size, + const char *name, size_t name_len, int type) +{ + if (list && XATTR_METAGROUP_LEN <= list_size) + memcpy(list, XATTR_METAGROUP_PREFIX, XATTR_METAGROUP_LEN); + return XATTR_METAGROUP_LEN; + +} + +static int +ext4_xattr_metagroup_get(struct dentry *dentry, const char *name, + void *buffer, size_t size, int type) +{ + if (strcmp(name, "") != 0) + return -EINVAL; + return ext4_xattr_get(dentry->d_inode, EXT4_XATTR_INDEX_METAGROUP, + name, buffer, size); +} + +static int +ext4_xattr_metagroup_set(struct dentry *dentry, const char *name, + const void *value, size_t size, int flags, int type) +{ + unsigned int new_mid; + if (strcmp(name, "") != 0) + return -EINVAL; + new_mid = simple_strtoul(value, (char **)&value, 0); + return ext4_metagroup_change(dentry->d_inode, new_mid); +} + +struct xattr_handler ext4_xattr_metagroup_handler = { + .prefix = XATTR_METAGROUP, + .list = ext4_xattr_metagroup_list, + .get = ext4_xattr_metagroup_get, + .set = ext4_xattr_metagroup_set, +}; -- 1.6.6 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH 1/6] vfs: add per-sb auxiliary inode attribute table 2010-02-18 16:45 ` [PATCH 1/6] vfs: add per-sb auxiliary inode attribute table Dmitry Monakhov 2010-02-18 16:45 ` [PATCH 2/6] quota: switch reservation space management to aux_attribute Dmitry Monakhov @ 2010-02-18 19:00 ` Brad Boyer 2010-02-18 19:34 ` Dmitry Monakhov 1 sibling, 1 reply; 13+ messages in thread From: Brad Boyer @ 2010-02-18 19:00 UTC (permalink / raw) To: Dmitry Monakhov; +Cc: linux-fsdevel On Thu, Feb 18, 2010 at 07:45:25PM +0300, Dmitry Monakhov wrote: > Some times it is useful to export non standard attributes > to generic vfs layer, but it is too expansive to store it > inside vfs inode. Let's introduce generic interface for this > purpose. One may declare an attribute and filesystem provides > access to it, if necessery. > > Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> > --- > @@ -1576,7 +1578,10 @@ struct super_operations { > #endif > int (*bdev_try_to_free_page)(struct super_block*, struct page*, gfp_t); > }; > - > +struct aux_attributes > +{ > + int supported; > +}; > /* > * Inode state bits. Protected by inode_lock. > * > -- What is the intended use of the supported field? You don't appear to use it anywhere other than to initialize it to 1 in the one instance where you create one of them. Brad Boyer flar@allandria.com ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/6] vfs: add per-sb auxiliary inode attribute table 2010-02-18 19:00 ` [PATCH 1/6] vfs: add per-sb auxiliary inode attribute table Brad Boyer @ 2010-02-18 19:34 ` Dmitry Monakhov 0 siblings, 0 replies; 13+ messages in thread From: Dmitry Monakhov @ 2010-02-18 19:34 UTC (permalink / raw) To: Brad Boyer; +Cc: linux-fsdevel Brad Boyer <flar@allandria.com> writes: > On Thu, Feb 18, 2010 at 07:45:25PM +0300, Dmitry Monakhov wrote: >> Some times it is useful to export non standard attributes >> to generic vfs layer, but it is too expansive to store it >> inside vfs inode. Let's introduce generic interface for this >> purpose. One may declare an attribute and filesystem provides >> access to it, if necessery. >> >> Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> >> --- >> @@ -1576,7 +1578,10 @@ struct super_operations { >> #endif >> int (*bdev_try_to_free_page)(struct super_block*, struct page*, gfp_t); >> }; >> - >> +struct aux_attributes >> +{ >> + int supported; >> +}; >> /* >> * Inode state bits. Protected by inode_lock. >> * >> -- > > What is the intended use of the supported field? You don't appear to use > it anywhere other than to initialize it to 1 in the one instance where > you create one of them. Actually i've use this only as a place holder, otherwise structure will be empty. > > Brad Boyer > flar@allandria.com ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 0/6] RFC: introduce extended inode owner identifier v4 2010-02-18 16:45 [PATCH 0/6] RFC: introduce extended inode owner identifier v4 Dmitry Monakhov 2010-02-18 16:45 ` [PATCH 1/6] vfs: add per-sb auxiliary inode attribute table Dmitry Monakhov @ 2010-02-18 23:31 ` Dave Chinner 2010-02-19 10:16 ` Dmitry Monakhov 1 sibling, 1 reply; 13+ messages in thread From: Dave Chinner @ 2010-02-18 23:31 UTC (permalink / raw) To: Dmitry Monakhov; +Cc: linux-fsdevel On Thu, Feb 18, 2010 at 07:45:24PM +0300, Dmitry Monakhov wrote: > This is new generation of attempt to add extended inode identifier. > In previous posts it was called tree_id, subtree_id, project_id. > But after none of this was not good enough. I've refused project_id > because it is well know XFS feature. Admins, users and developers of mangement tools are all going to hate us if we introduce subtly different "project/directory quota like" accounting to different filesystems with different administration mechanisms. The fact that project quotas are already implemented in XFS is not a valid reason for creating a new, slightly less functional, incompatible implementation of the same feature in other filesystems. > And my implementation is > slightly different from it especially from user-space point of view. This is exactly my point - if a user has an ext4 filesystem and an xfs filesystem then your proposal will result in them needing two different mechanisms to manage the project/directory quotas on their filesystems. This result is not desirable from a system design perspective. Management of such a feature needs to be consistent across all filesystem types - just like it is for user and group quotas - and we already have a widely used and well tested management interface that can be used to implement exactly what you need. > In order to avoid ambiguity i've stopped at the "metagroup" term. > I hope it is final name for the feature. I think "metagroup" is too abstract and will likely be confused with group quotas by those that don't understand what it is. i.e it does not convey any information about the bounds of the quota container (unlike user, group, directory or project). Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 0/6] RFC: introduce extended inode owner identifier v4 2010-02-18 23:31 ` [PATCH 0/6] RFC: introduce extended inode owner identifier v4 Dave Chinner @ 2010-02-19 10:16 ` Dmitry Monakhov 2010-02-19 23:31 ` Dave Chinner 0 siblings, 1 reply; 13+ messages in thread From: Dmitry Monakhov @ 2010-02-19 10:16 UTC (permalink / raw) To: Dave Chinner; +Cc: linux-fsdevel Dave Chinner <david@fromorbit.com> writes: > On Thu, Feb 18, 2010 at 07:45:24PM +0300, Dmitry Monakhov wrote: >> This is new generation of attempt to add extended inode identifier. >> In previous posts it was called tree_id, subtree_id, project_id. >> But after none of this was not good enough. I've refused project_id >> because it is well know XFS feature. > > Admins, users and developers of mangement tools are all going to > hate us if we introduce subtly different "project/directory quota > like" accounting to different filesystems with different > administration mechanisms. Seems what you right here. > > The fact that project quotas are already implemented in XFS is not a > valid reason for creating a new, slightly less functional, > incompatible implementation of the same feature in other > filesystems. > >> And my implementation is >> slightly different from it especially from user-space point of view. > > This is exactly my point - if a user has an ext4 filesystem and an > xfs filesystem then your proposal will result in them needing two > different mechanisms to manage the project/directory quotas on their > filesystems. This result is not desirable from a system design > perspective. Management of such a feature needs to be consistent > across all filesystem types - just like it is for user and group > quotas - and we already have a widely used and well tested > management interface that can be used to implement exactly what you > need. Not exactly. XFS allow only subtree-like structure (link, rename are restricted). Personally I think what right restriction, but someone may want to have not subtree-like hierarchy. So this patch doesn't introduce any link/rename rules. If user want to restrict his tree it will use bindmount. IMHO it is more intuitive than XFS does. But again you definitely right about feature_names/interfaces ambiguity If we can create common interface it would be great. See later in the mail. > >> In order to avoid ambiguity i've stopped at the "metagroup" term. >> I hope it is final name for the feature. > > I think "metagroup" is too abstract and will likely be confused with > group quotas by those that don't understand what it is. i.e it does > not convey any information about the bounds of the quota container > (unlike user, group, directory or project). Ok. Since we want common interface we should use well known "project_id" term. I think we can try to unify it in following way: *User interface* As soon as i understand XFS manage projid via xfs_ioctl_setattr, struct fsxattr. IMHO it is not good idea to make this interface common for all filesystems. Let's use standard i_op->setxattr/getxattr for this purpose. Let's name this xattr as "system.project_id". And xfs may easily catch corresponding setxattr/getxatrr and translate it to it's ioctl interface, so both interfaces will be equal. At least xattr interface already supported by various utils (tar, rsync, etc). *Link/Rename behavior* Let's introduce two modes: 1) SHARED project hierarchy: without restrictions for link/renames 2) ISOLATED project hierarchy: Well known XFS (subtrees like) link/rename rules And support this two mode like this: generic_fs) SHARED: by default ISOLATED: via bindmount XFS) ISOLATED: by default, because this is expected semantics (no changes required) SHARED: xfs may add "shared_project" mount feature to disable isolation semantics. At least this gives user more flexibility than before. We have to document such difference. In order to avoid misbehavior. *VFS interface to project_id* In order to make profit of project_id we have to make it visible to vfs layer, and let quota and nfsd (any other users?) exploit this. Let's use proposed per-sb aux_attributes table for this purpose. Off course i was wrong then proposed to export pointer to project_id (former metagroup) var. Since this value is read-only we have to export it like this: unsigned get_project_id(struct inode *inode) And document what project_id changes are guarded by inode->i_mutex So caller have to grab i_mutex in order to avoid races. What do you think? ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 0/6] RFC: introduce extended inode owner identifier v4 2010-02-19 10:16 ` Dmitry Monakhov @ 2010-02-19 23:31 ` Dave Chinner 2010-02-20 10:58 ` Dmitry Monakhov 0 siblings, 1 reply; 13+ messages in thread From: Dave Chinner @ 2010-02-19 23:31 UTC (permalink / raw) To: Dmitry Monakhov; +Cc: linux-fsdevel On Fri, Feb 19, 2010 at 01:16:47PM +0300, Dmitry Monakhov wrote: > Dave Chinner <david@fromorbit.com> writes: > > > On Thu, Feb 18, 2010 at 07:45:24PM +0300, Dmitry Monakhov wrote: > >> This is new generation of attempt to add extended inode identifier. > >> In previous posts it was called tree_id, subtree_id, project_id. > >> But after none of this was not good enough. I've refused project_id > >> because it is well know XFS feature. > > > > Admins, users and developers of mangement tools are all going to > > hate us if we introduce subtly different "project/directory quota > > like" accounting to different filesystems with different > > administration mechanisms. > Seems what you right here. > > > > The fact that project quotas are already implemented in XFS is not a > > valid reason for creating a new, slightly less functional, > > incompatible implementation of the same feature in other > > filesystems. > > > >> And my implementation is > >> slightly different from it especially from user-space point of view. > > > > This is exactly my point - if a user has an ext4 filesystem and an > > xfs filesystem then your proposal will result in them needing two > > different mechanisms to manage the project/directory quotas on their > > filesystems. This result is not desirable from a system design > > perspective. Management of such a feature needs to be consistent > > across all filesystem types - just like it is for user and group > > quotas - and we already have a widely used and well tested > > management interface that can be used to implement exactly what you > > need. > Not exactly. XFS allow only subtree-like structure Not true at all. XFS allows an arbitrary distribution of files in a given project - they are not restricted to subtrees. This isn't widely used because it requires manually setting the project ID after the file is created. e.g. create a backup tarball of a project heirarchy in an external non-controlled directory, then change the project ID of the tarball to the correct project ID so that the backup is also accounted to the correct project... For example, I'll create a new project (testproj) and subtree (/mnt/xfs/foo) associated with the project, create a 25MB file inside the subtree, show it being accounted, the copy it outside the subtree, show it isn't accounted, then change the project ID of the outside copy to testproj and show that it is accounted to the testproj even though it is outside the subtree: # mkfs.xfs -f /dev/ubd/1 [.....] # mount -o prjquota /dev/ubd/1 /mnt/xfs # mkdir /mnt/xfs/foo # # # echo testproj:42 >> /etc/projid # echo 42:/mnt/xfs/foo >> /etc/projects # xfs_quota -x -c 'project -s testproj' /mnt/xfs Setting up project testproj (path /mnt/xfs/foo)... Processed 1 /etc/projects paths for project testproj # # # # xfs_quota -x -c 'limit -p bhard=1g testproj' /mnt/xfs # xfs_quota -x -c print /mnt/xfs Filesystem Pathname /mnt/xfs /dev/ubd/1 (pquota) /mnt/xfs/foo /dev/ubd/1 (project 42, testproj) # xfs_quota -x -c report /mnt/xfs Project quota on /mnt/xfs (/dev/ubd/1) Blocks Project ID Used Soft Hard Warn/Grace ---------- -------------------------------------------------- testproj 0 0 1048576 00 [--------] # # # # dd if=/dev/zero of=foo/testfile bs=1024k count=25 25+0 records in 25+0 records out 26214400 bytes (26 MB) copied, 0.116102 s, 226 MB/s # sudo xfs_quota -x -c report /mnt/xfs Project quota on /mnt/xfs (/dev/ubd/1) Blocks Project ID Used Soft Hard Warn/Grace ---------- -------------------------------------------------- testproj 25600 0 1048576 00 [--------] # # # # cp foo/testfile . # sync # xfs_quota -x -c report /mnt/xfs Project quota on /mnt/xfs (/dev/ubd/1) Blocks Project ID Used Soft Hard Warn/Grace ---------- -------------------------------------------------- testproj 25600 0 1048576 00 [--------] # # # # xfs_io -f -c "chproj 42" testfile # xfs_quota -x -c report /mnt/xfs Project quota on /mnt/xfs (/dev/ubd/1) Blocks Project ID Used Soft Hard Warn/Grace ---------- -------------------------------------------------- testproj 51200 0 1048576 00 [--------] # > (link, rename are restricted). The EXDEV on rename behaviour is purely an implementation detail - it makes quota accounting in XFS simple. i.e. rename returns EXDEV so that a mv(1) will fall back to create/copy/unlink and that automatically gets the quota accounting correct. That is, it didn't require a complex extension of dquot handling in the rename transaction to implement. This one could be fixed, and a couple of ppl have actually asked recently if it could be done because moving a few TB of data between projects is time consuming. However, hard links are a different matter. If you can clearly determine how to hard link a file into multiple different projects (dquots), then track and account for all the space used in a sane manner, work out how to account for new or removed files in such a hardlinked directory, etc, then you can allow hard links between different subtrees. For example, if you add a new file into such a hard linked directory, who does it get accounted to? What happens if you then move a multiple-hard linked file to a different subtree? If the inode is accounted to all projects, then each of these filesystem transactions requires updating an arbitrary (unbound) number of dquots - this alone makes journal reservations for transactions a nightmare to calculate and greatly increases the complexity of such transactions. Disallowing hard links between directories in different projects makes these cans of worms go away - it is a very practical design choice to make. However, it in no way results in XFS project quotas being restricted to subtrees - it is a *change of project quota* that triggers these behaviours. > Personally I think what right restriction, but someone may > want to have not subtree-like hierarchy. So this patch doesn't introduce > any link/rename rules. The link/rename behaviour of XFS does not prevent this type of usage at all. > If user want to restrict his tree it will use > bindmount. IMHO it is more intuitive than XFS does. XFS is not trying to implement bind mount -like restrictions. The behaviour was carefully designed to allow project quota's to be sanely implemented. > But again you definitely right about feature_names/interfaces ambiguity > If we can create common interface it would be great. See later in > the mail. > > > >> In order to avoid ambiguity i've stopped at the "metagroup" term. > >> I hope it is final name for the feature. > > > > I think "metagroup" is too abstract and will likely be confused with > > group quotas by those that don't understand what it is. i.e it does > > not convey any information about the bounds of the quota container > > (unlike user, group, directory or project). > Ok. Since we want common interface we should use well known "project_id" > term. > > I think we can try to unify it in following way: > *User interface* > As soon as i understand XFS manage projid via xfs_ioctl_setattr, > struct fsxattr. IMHO it is not good idea to make this interface common > for all filesystems. Let's use standard i_op->setxattr/getxattr for > this purpose. Let's name this xattr as "system.project_id". That's fine by me. I'd much prefer that we used the xattr interface for inode attributes instead of poking bits through fcntl or ioctls... > And xfs may easily catch corresponding setxattr/getxatrr and translate > it to it's ioctl interface, so both interfaces will be equal. > At least xattr interface already supported by various utils (tar, > rsync, etc). Well, the point of the way XFS implements project quotas is that utilities such as cp, mv, tar, rsync, etc do not need to know anything about them - just like user/group quotas. If we go down the xattr route, then these utilities can't be allowed to copy these xattrs to new files; the filesystem has to create them atomically with the new inodes so that they are accounted correctly. If they are created non-atomically and the system crashes between creating the file and applying the quota xattr, then you have an inconsistency that only a quotacheck will pick up.... > *Link/Rename behavior* > Let's introduce two modes: > 1) SHARED project hierarchy: without restrictions for link/renames See above - I don't think "without restrictions" can be easily implemented because of the complexity hard links introduce. > 2) ISOLATED project hierarchy: Well known XFS (subtrees like) > link/rename rules > And support this two mode like this: > generic_fs) > SHARED: by default > ISOLATED: via bindmount > XFS) This is a change of behaviour from the existing XFS project quota configurations as they do not require bind mounts at all. I'm interested to know how you see this working when you have multiple subtrees with the same project ID? Renaming and linking between those subtrees is currently possible with XFS project IDs, but adding bind mounts would cause EXDEV to be returned for these operations. i.e. It seems to me that these subtrees are "shared" by your definition, but the addition of bind mounts makes them "isolated". Or you want a part of a subtree to be moved to a different project ID because it needs to be accounted separately? e.g. a group gets moved in the organisation heirarchy, so the bean counters want to change the project ID on all their files so there space usage can be billed to the new department. If bind mounts are involved, this quickly becomes complex and unmaintainable. It's not something that users can easily manage, especially compared to the current 'xfs_io -c "chproj -R <projid>" /path/to/subtree' method of doing this. ---- IMO focusing on link/rename restrictions as the deciding factor in defining the user interface is wrong. I started out by saying that having different user interfaces for different filesystems is not desirable. You've ended up trying to encode the differences you assume exist into a new user interface instead. I'll rephrase the question - what part of the existing XFS project quota administration interface (i.e. /etc/projects, /etc/projid, a quota command to set up the initial tree, etc) is not sufficient for your purposes of defining and managing subtrees? If it is not sufficient, what simple extensions can we add that will make it sufficient? Once we've got the high level management interface defined, everything else is just details. ;) > ISOLATED: by default, because this is expected semantics (no > changes required) > SHARED: xfs may add "shared_project" mount feature to disable > isolation semantics. At least this gives user more > flexibility than before. > We have to document such difference. In order to avoid misbehavior. > *VFS interface to project_id* > In order to make profit of project_id we have to make it visible to > vfs layer, and let quota and nfsd (any other users?) exploit this. > Let's use proposed per-sb aux_attributes table for this purpose. Why go to that complexity? Just add a 32 bit proj_id identifier to the struct inode. If it's supposed to be generic, then simply implement it like user and group quotas are. Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 0/6] RFC: introduce extended inode owner identifier v4 2010-02-19 23:31 ` Dave Chinner @ 2010-02-20 10:58 ` Dmitry Monakhov 0 siblings, 0 replies; 13+ messages in thread From: Dmitry Monakhov @ 2010-02-20 10:58 UTC (permalink / raw) To: Dave Chinner; +Cc: linux-fsdevel [-- Attachment #1: Type: text/plain, Size: 14446 bytes --] Dave Chinner <david@fromorbit.com> writes: > On Fri, Feb 19, 2010 at 01:16:47PM +0300, Dmitry Monakhov wrote: >> Dave Chinner <david@fromorbit.com> writes: >> >> > On Thu, Feb 18, 2010 at 07:45:24PM +0300, Dmitry Monakhov wrote: >> >> This is new generation of attempt to add extended inode identifier. >> >> In previous posts it was called tree_id, subtree_id, project_id. >> >> But after none of this was not good enough. I've refused project_id >> >> because it is well know XFS feature. >> > >> > Admins, users and developers of mangement tools are all going to >> > hate us if we introduce subtly different "project/directory quota >> > like" accounting to different filesystems with different >> > administration mechanisms. >> Seems what you right here. >> > >> > The fact that project quotas are already implemented in XFS is not a >> > valid reason for creating a new, slightly less functional, >> > incompatible implementation of the same feature in other >> > filesystems. >> > >> >> And my implementation is >> >> slightly different from it especially from user-space point of view. >> > >> > This is exactly my point - if a user has an ext4 filesystem and an >> > xfs filesystem then your proposal will result in them needing two >> > different mechanisms to manage the project/directory quotas on their >> > filesystems. This result is not desirable from a system design >> > perspective. Management of such a feature needs to be consistent >> > across all filesystem types - just like it is for user and group >> > quotas - and we already have a widely used and well tested >> > management interface that can be used to implement exactly what you >> > need. >> Not exactly. XFS allow only subtree-like structure > > Not true at all. XFS allows an arbitrary distribution of files in a > given project - they are not restricted to subtrees. This isn't > widely used because it requires manually setting the project ID > after the file is created. e.g. create a backup tarball of a project > heirarchy in an external non-controlled directory, then change the > project ID of the tarball to the correct project ID so that the > backup is also accounted to the correct project... > > For example, I'll create a new project (testproj) and subtree > (/mnt/xfs/foo) associated with the project, create a 25MB file > inside the subtree, show it being accounted, the copy it outside > the subtree, show it isn't accounted, then change the project ID > of the outside copy to testproj and show that it is accounted to > the testproj even though it is outside the subtree: > > # mkfs.xfs -f /dev/ubd/1 > [.....] > # mount -o prjquota /dev/ubd/1 /mnt/xfs > # mkdir /mnt/xfs/foo > # > # > # echo testproj:42 >> /etc/projid > # echo 42:/mnt/xfs/foo >> /etc/projects > # xfs_quota -x -c 'project -s testproj' /mnt/xfs > Setting up project testproj (path /mnt/xfs/foo)... > Processed 1 /etc/projects paths for project testproj > # > # > # > # xfs_quota -x -c 'limit -p bhard=1g testproj' /mnt/xfs > # xfs_quota -x -c print /mnt/xfs > Filesystem Pathname > /mnt/xfs /dev/ubd/1 (pquota) > /mnt/xfs/foo /dev/ubd/1 (project 42, testproj) > # xfs_quota -x -c report /mnt/xfs > Project quota on /mnt/xfs (/dev/ubd/1) > Blocks > Project ID Used Soft Hard Warn/Grace > ---------- -------------------------------------------------- > testproj 0 0 1048576 00 [--------] > > # > # > # > # dd if=/dev/zero of=foo/testfile bs=1024k count=25 > 25+0 records in > 25+0 records out > 26214400 bytes (26 MB) copied, 0.116102 s, 226 MB/s > # sudo xfs_quota -x -c report /mnt/xfs > Project quota on /mnt/xfs (/dev/ubd/1) > Blocks > Project ID Used Soft Hard Warn/Grace > ---------- -------------------------------------------------- > testproj 25600 0 1048576 00 [--------] > > # > # > # > # cp foo/testfile . > # sync > # xfs_quota -x -c report /mnt/xfs > Project quota on /mnt/xfs (/dev/ubd/1) > Blocks > Project ID Used Soft Hard Warn/Grace > ---------- -------------------------------------------------- > testproj 25600 0 1048576 00 [--------] > > # > # > # > # xfs_io -f -c "chproj 42" testfile > # xfs_quota -x -c report /mnt/xfs > Project quota on /mnt/xfs (/dev/ubd/1) > Blocks > Project ID Used Soft Hard Warn/Grace > ---------- -------------------------------------------------- > testproj 51200 0 1048576 00 [--------] > > # > > >> (link, rename are restricted). > > The EXDEV on rename behaviour is purely an implementation detail - > it makes quota accounting in XFS simple. i.e. rename returns EXDEV > so that a mv(1) will fall back to create/copy/unlink and that > automatically gets the quota accounting correct. That is, it didn't > require a complex extension of dquot handling in the rename > transaction to implement. This one could be fixed, and a couple of > ppl have actually asked recently if it could be done because moving > a few TB of data between projects is time consuming. > > However, hard links are a different matter. If you can clearly > determine how to hard link a file into multiple different projects > (dquots), then track and account for all the space used in a sane > manner, work out how to account for new or removed files in such a > hardlinked directory, etc, then you can allow hard links between > different subtrees. Yess. I do understand that. In fact initially i've specify rename/link rules by myself, later i've discovered that XFS implemented this long time ago in exact same way. BTW: renames also is not so simple because renaming file which has more than one hardlinks result in same madness situation. But as AlViro pointed this semantics is already implemented in bindmout (IMHO the only bad thing is that bindmount is not persistent structure). We just give user a rope and it his decision to shoot, or not to shoot himself. Otherwise. We may try to force AlViro to like hardlink isolation idea. May be restrict this tiny rule under CONFIG_PROJECT_ID_ISOLATED config option. > > For example, if you add a new file into such a hard linked > directory, who does it get accounted to? What happens if you then > move a multiple-hard linked file to a different subtree? By assumption inode may belongs only to one project, the one thich stored inside private_inode->i_prjid. It will be accounted in that quota. > If the > inode is accounted to all projects, then each of these filesystem > transactions requires updating an arbitrary (unbound) number of > dquots - this alone makes journal reservations for transactions a > nightmare to calculate and greatly increases the complexity of such > transactions. > > Disallowing hard links between directories in different projects > makes these cans of worms go away - it is a very practical design > choice to make. However, it in no way results in XFS project quotas > being restricted to subtrees - it is a *change of project quota* > that triggers these behaviours. > >> Personally I think what right restriction, but someone may >> want to have not subtree-like hierarchy. So this patch doesn't introduce >> any link/rename rules. > > The link/rename behaviour of XFS does not prevent this type of usage > at all. > >> If user want to restrict his tree it will use >> bindmount. IMHO it is more intuitive than XFS does. > > XFS is not trying to implement bind mount -like restrictions. The > behaviour was carefully designed to allow project quota's to be > sanely implemented. > >> But again you definitely right about feature_names/interfaces ambiguity >> If we can create common interface it would be great. See later in >> the mail. >> > >> >> In order to avoid ambiguity i've stopped at the "metagroup" term. >> >> I hope it is final name for the feature. >> > >> > I think "metagroup" is too abstract and will likely be confused with >> > group quotas by those that don't understand what it is. i.e it does >> > not convey any information about the bounds of the quota container >> > (unlike user, group, directory or project). >> Ok. Since we want common interface we should use well known "project_id" >> term. >> >> I think we can try to unify it in following way: >> *User interface* >> As soon as i understand XFS manage projid via xfs_ioctl_setattr, >> struct fsxattr. IMHO it is not good idea to make this interface common >> for all filesystems. Let's use standard i_op->setxattr/getxattr for >> this purpose. Let's name this xattr as "system.project_id". > > That's fine by me. I'd much prefer that we used the xattr interface > for inode attributes instead of poking bits through fcntl or ioctls... > >> And xfs may easily catch corresponding setxattr/getxatrr and translate >> it to it's ioctl interface, so both interfaces will be equal. >> At least xattr interface already supported by various utils (tar, >> rsync, etc). > > Well, the point of the way XFS implements project quotas is that > utilities such as cp, mv, tar, rsync, etc do not need to know > anything about them - just like user/group quotas. > > If we go down the xattr route, then these utilities can't be allowed > to copy these xattrs to new files; the filesystem has to create them > atomically with the new inodes so that they are accounted correctly. Exactly. It is like init_acl init_security works on inode creation (see fs/ext4/ialloc.c). I've (by occasion) miss that in posted version of ext4-add-metagroup-support patch In fact i have to confess that ext4-metagroup-patrt was in not working state at a posting time. Currently it's seems to work, see patch attached. BTW project_id changing procedure is looks really ugly because we have to perform two things in a row quota_transfer, proj_id update if we enabled to update project_id then we have to roll-back quota. And in fact this may result in -EDQUOT because currently quota has not *force* charge flag :) > If they are created non-atomically and the system crashes between > creating the file and applying the quota xattr, then you have an > inconsistency that only a quotacheck will pick up.... This is just the way how it works for now each tar like application works like this 1)open 2)write 3)chown 4)chmod So it somethings happens before (3) will be accounted to current user_id > >> *Link/Rename behavior* >> Let's introduce two modes: >> 1) SHARED project hierarchy: without restrictions for link/renames > > See above - I don't think "without restrictions" can be easily > implemented because of the complexity hard links introduce. > >> 2) ISOLATED project hierarchy: Well known XFS (subtrees like) >> link/rename rules >> And support this two mode like this: >> generic_fs) >> SHARED: by default >> ISOLATED: via bindmount >> XFS) > > This is a change of behaviour from the existing XFS project quota > configurations as they do not require bind mounts at all. > > I'm interested to know how you see this working when you have > multiple subtrees with the same project ID? Renaming and linking Yepp good catch. > between those subtrees is currently possible with XFS project IDs, > but adding bind mounts would cause EXDEV to be returned for these > operations. i.e. It seems to me that these subtrees are "shared" by > your definition, but the addition of bind mounts makes them > "isolated". > > Or you want a part of a subtree to be moved to a different project > ID because it needs to be accounted separately? e.g. a group gets > moved in the organisation heirarchy, so the bean counters want to > change the project ID on all their files so there space usage can be > billed to the new department. If bind mounts are involved, this > quickly becomes complex and unmaintainable. It's not something that > users can easily manage, especially compared to the current 'xfs_io > -c "chproj -R <projid>" /path/to/subtree' method of doing this. Seem we have to work on "vfs people to like isolation subtrees" plan. > > ---- > > IMO focusing on link/rename restrictions as the deciding factor in > defining the user interface is wrong. I started out by saying that > having different user interfaces for different filesystems is not > desirable. You've ended up trying to encode the differences you > assume exist into a new user interface instead. > > I'll rephrase the question - what part of the existing XFS project > quota administration interface (i.e. /etc/projects, /etc/projid, a > quota command to set up the initial tree, etc) is not sufficient for > your purposes of defining and managing subtrees? If it is not > sufficient, what simple extensions can we add that will make it > sufficient? Once we've got the high level management interface > defined, everything else is just details. ;) > XFS interface it enough. IMHO it is kinda rich. But still all necessary things are already there. >> ISOLATED: by default, because this is expected semantics (no >> changes required) >> SHARED: xfs may add "shared_project" mount feature to disable >> isolation semantics. At least this gives user more >> flexibility than before. >> We have to document such difference. In order to avoid misbehavior. > > >> *VFS interface to project_id* >> In order to make profit of project_id we have to make it visible to >> vfs layer, and let quota and nfsd (any other users?) exploit this. >> Let's use proposed per-sb aux_attributes table for this purpose. > > Why go to that complexity? Just add a 32 bit proj_id identifier to > the struct inode. If it's supposed to be generic, then simply > implement it like user and group quotas are. Off course this is best solution. But then i've added i_rsv_space field to vfs_inode to support quota allocation for delayed allocation. Many peoples was fairly against idea to bloat a vfs_inode. So i've come in to idea to design some aux_inode_table. And allow everybody to put they crap in to that table without big discussions. But project_id case is better because it can be hided under CONFIG_PROJECT_ID option. So wasting of space not happen. next round i'll embed it in to vfs_inode and if people will be really blame on this, we will beck to aux_inode_table approach. I've plan to post next generation next Monday. [-- Attachment #2: 0001-ext4-Implement-project-ID-support-for-ext4-filesyste.patch --] [-- Type: text/plain, Size: 11747 bytes --] >From bb16c459d7d9bee5de5f4a7885ac1edfac0c34aa Mon Sep 17 00:00:00 2001 From: Dmitry Monakhov <dmonakhov@openvz.org> Date: Sat, 20 Feb 2010 13:38:27 +0300 Subject: [PATCH] ext4: Implement project ID support for ext4 filesystem Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> --- fs/ext4/Kconfig | 8 ++ fs/ext4/Makefile | 1 + fs/ext4/ext4.h | 8 ++- fs/ext4/ialloc.c | 8 ++- fs/ext4/inode.c | 13 +++- fs/ext4/super.c | 9 ++- fs/ext4/xattr.c | 7 ++ fs/ext4/xattr.h | 19 +++++ fs/ext4/xattr_project_id.c | 169 ++++++++++++++++++++++++++++++++++++++++++++ 9 files changed, 238 insertions(+), 4 deletions(-) create mode 100644 fs/ext4/xattr_project_id.c diff --git a/fs/ext4/Kconfig b/fs/ext4/Kconfig index 9ed1bb1..1c04c9f 100644 --- a/fs/ext4/Kconfig +++ b/fs/ext4/Kconfig @@ -74,6 +74,14 @@ config EXT4_FS_SECURITY If you are not using a security module that requires using extended attributes for file security labels, say N. +config EXT4_PROJECT_ID + bool "Ext4 project_id support" + depends on PROJECT_ID + depends on EXT4_FS_XATTR + help + Enables project inode identifier support for ext4 filesystem. + This feature allow to assign some id to inodes similar to + uid/gid. config EXT4_DEBUG bool "EXT4 debugging support" diff --git a/fs/ext4/Makefile b/fs/ext4/Makefile index 8867b2a..04080cd 100644 --- a/fs/ext4/Makefile +++ b/fs/ext4/Makefile @@ -11,3 +11,4 @@ ext4-y := balloc.o bitmap.o dir.o file.o fsync.o ialloc.o inode.o \ ext4-$(CONFIG_EXT4_FS_XATTR) += xattr.o xattr_user.o xattr_trusted.o ext4-$(CONFIG_EXT4_FS_POSIX_ACL) += acl.o ext4-$(CONFIG_EXT4_FS_SECURITY) += xattr_security.o +ext4-$(CONFIG_EXT4_PROJECT_ID) += xattr_project_id.o diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index b2c01a2..bc5c919 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -719,6 +719,10 @@ struct ext4_inode_info { */ tid_t i_sync_tid; tid_t i_datasync_tid; +#ifdef CONFIG_EXT4_PROJECT_ID + /* project_id id, additional owner identifier similar to uid/gid */ + unsigned int i_mid; +#endif }; /* @@ -766,7 +770,9 @@ struct ext4_inode_info { #define EXT4_MOUNT_DELALLOC 0x8000000LL /* Delalloc support */ #define EXT4_MOUNT_DATA_ERR_ABORT 0x10000000LL /* Abort on file data write */ #define EXT4_MOUNT_BLOCK_VALIDITY 0x20000000LL /* Block validity checking */ -#define EXT4_MOUNT_DISCARD 0x40000000LL /* Issue DISCARD requests */ +#define EXT4_MOUNT_DISCARD 0x40000000LL /* Issue DISCARD requests +*/ +#define EXT4_MOUNT_PROJECT_ID 0x80000000LL /* extended owner id */ #define clear_opt(o, opt) o &= ~EXT4_MOUNT_##opt #define set_opt(o, opt) o |= EXT4_MOUNT_##opt diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c index f3624ea..ae88188 100644 --- a/fs/ext4/ialloc.c +++ b/fs/ext4/ialloc.c @@ -1032,7 +1032,10 @@ got: ei->i_state = EXT4_STATE_NEW; ei->i_extra_isize = EXT4_SB(sb)->s_want_extra_isize; - +#ifdef CONFIG_EXT4_PROJECT_ID + // XXX: move this to generic inode init helper + ei->i_mid = EXT4_I(dir)->i_mid; +#endif ret = inode; if (vfs_dq_alloc_inode(inode)) { err = -EDQUOT; @@ -1046,6 +1049,9 @@ got: err = ext4_init_security(handle, inode, dir); if (err) goto fail_free_drop; + err = ext4_prjid_write(handle, inode, ei->i_mid, XATTR_CREATE); + if (err) + goto fail_free_drop; if (EXT4_HAS_INCOMPAT_FEATURE(sb, EXT4_FEATURE_INCOMPAT_EXTENTS)) { /* set extent flag only for directory, file and normal symlink*/ diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index e119524..59d5cf1 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -4936,7 +4936,18 @@ struct inode *ext4_iget(struct super_block *sb, unsigned long ino) } if (ret) goto bad_inode; - +#ifdef CONFIG_EXT4_PROJECT_ID + if(test_opt(inode->i_sb, PROJECT_ID)) { + ret = ext4_prjid_read(inode, &ei->i_mid); + if (ret == -ENODATA) { + ei->i_mid = 0; + ret = 0; + } + if (ret) + goto bad_inode; + } else + ei->i_mid = 0; +#endif if (S_ISREG(inode->i_mode)) { inode->i_op = &ext4_file_inode_operations; inode->i_fop = &ext4_file_operations; diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 0b8fbab..12b7c2d 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -923,6 +923,9 @@ static int ext4_show_options(struct seq_file *seq, struct vfsmount *vfs) if (test_opt(sb, DISCARD)) seq_puts(seq, ",discard"); + if (test_opt(sb, PROJECT_ID)) + seq_puts(seq, ",project_id"); + if (test_opt(sb, NOLOAD)) seq_puts(seq, ",norecovery"); @@ -1113,7 +1116,7 @@ enum { Opt_stripe, Opt_delalloc, Opt_nodelalloc, Opt_block_validity, Opt_noblock_validity, Opt_inode_readahead_blks, Opt_journal_ioprio, - Opt_discard, Opt_nodiscard, + Opt_discard, Opt_nodiscard, Opt_project_id, }; static const match_table_t tokens = { @@ -1182,6 +1185,7 @@ static const match_table_t tokens = { {Opt_noauto_da_alloc, "noauto_da_alloc"}, {Opt_discard, "discard"}, {Opt_nodiscard, "nodiscard"}, + {Opt_project_id, "project_id"}, {Opt_err, NULL}, }; @@ -1613,6 +1617,9 @@ set_qf_format: case Opt_nodiscard: clear_opt(sbi->s_mount_opt, DISCARD); break; + case Opt_project_id: + set_opt(sbi->s_mount_opt, PROJECT_ID); + break; default: ext4_msg(sb, KERN_ERR, "Unrecognized mount option \"%s\" " diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c index f3a2f7e..4dce406 100644 --- a/fs/ext4/xattr.c +++ b/fs/ext4/xattr.c @@ -107,6 +107,10 @@ static struct xattr_handler *ext4_xattr_handler_map[] = { #ifdef CONFIG_EXT4_FS_SECURITY [EXT4_XATTR_INDEX_SECURITY] = &ext4_xattr_security_handler, #endif +#ifdef CONFIG_EXT4_PROJECT_ID + [EXT4_XATTR_INDEX_PROJECT_ID] = &ext4_xattr_prjid_handler, +#endif + }; struct xattr_handler *ext4_xattr_handlers[] = { @@ -119,6 +123,9 @@ struct xattr_handler *ext4_xattr_handlers[] = { #ifdef CONFIG_EXT4_FS_SECURITY &ext4_xattr_security_handler, #endif +#ifdef CONFIG_EXT4_PROJECT_ID + &ext4_xattr_prjid_handler, +#endif NULL }; diff --git a/fs/ext4/xattr.h b/fs/ext4/xattr.h index 8ede88b..f794719 100644 --- a/fs/ext4/xattr.h +++ b/fs/ext4/xattr.h @@ -21,6 +21,7 @@ #define EXT4_XATTR_INDEX_TRUSTED 4 #define EXT4_XATTR_INDEX_LUSTRE 5 #define EXT4_XATTR_INDEX_SECURITY 6 +#define EXT4_XATTR_INDEX_PROJECT_ID 7 struct ext4_xattr_header { __le32 h_magic; /* magic number for identification */ @@ -70,6 +71,7 @@ extern struct xattr_handler ext4_xattr_trusted_handler; extern struct xattr_handler ext4_xattr_acl_access_handler; extern struct xattr_handler ext4_xattr_acl_default_handler; extern struct xattr_handler ext4_xattr_security_handler; +extern struct xattr_handler ext4_xattr_prjid_handler; extern ssize_t ext4_listxattr(struct dentry *, char *, size_t); @@ -153,3 +155,20 @@ static inline int ext4_init_security(handle_t *handle, struct inode *inode, return 0; } #endif + +#ifdef CONFIG_EXT4_PROJECT_ID +extern int ext4_prjid_read(struct inode *inode, unsigned int *mid); +extern int ext4_prjid_write(handle_t *handle, struct inode *inode, + unsigned int mid, int xflags); +#else +inline int ext4_prjid_read(struct inode *inode, unsigned int *mid) +{ + return -ENOTSUPP; +} +inline int ext4_prjid_write(handle_t *handle, struct inode *inode, + unsigned int mid, int xflags) +{ + return -ENOTSUPP; +} + +#endif diff --git a/fs/ext4/xattr_project_id.c b/fs/ext4/xattr_project_id.c new file mode 100644 index 0000000..1812e47 --- /dev/null +++ b/fs/ext4/xattr_project_id.c @@ -0,0 +1,169 @@ +/* + * linux/fs/ext4/xattr_project_id.c + * + * Copyright (C) 2010 Parallels Inc + * Dmitry Monakhov <dmonakhov@openvz.org> + */ + +#include <linux/init.h> +#include <linux/sched.h> +#include <linux/slab.h> +#include <linux/capability.h> +#include <linux/fs.h> +#include <linux/quotaops.h> +#include "ext4_jbd2.h" +#include "ext4.h" +#include "xattr.h" + +/* + * Read project_id id from inode's xattr + * Locking: none + */ +int ext4_prjid_read(struct inode *inode, unsigned int *mid) +{ + __le32 dsk_mid; + int retval; + retval = ext4_xattr_get(inode, EXT4_XATTR_INDEX_PROJECT_ID, "", + &dsk_mid, sizeof (dsk_mid)); + if (retval > 0) { + if (retval != sizeof(dsk_mid)) + return -EIO; + else + retval = 0; + } + *mid = le32_to_cpu(dsk_mid); + return retval; + +} + +/* + * Save project_id id to inode's xattr + * Locking: none + */ +int ext4_prjid_write(handle_t *handle, struct inode *inode, + unsigned int mid, int xflags) +{ + __le32 dsk_mid = cpu_to_le32(mid); + int retval; + retval = ext4_xattr_set_handle(handle, inode, EXT4_XATTR_INDEX_PROJECT_ID, "", + &dsk_mid, sizeof (dsk_mid), xflags); + if (retval > 0) { + if (retval != sizeof(dsk_mid)) + retval = -EIO; + else + retval = 0; + } + return retval; +} + +/* + * Change project_id id. + * Called under inode->i_mutex + */ +static int ext4_prjid_change(struct inode *inode, unsigned int new_mid) +{ + /* + * One data_trans_blocks chunk for xattr update. + * One quota_trans_blocks chunk for quota transfer, and one + * quota_trans_block chunk for emergency quota rollback transfer, + * because quota rollback may result new quota blocks allocation. + */ + unsigned credits = EXT4_DATA_TRANS_BLOCKS(inode->i_sb) + + EXT4_QUOTA_TRANS_BLOCKS(inode->i_sb) * 2; + qid_t qid[MAXQUOTAS]; + int ret, ret2 = 0; + unsigned retries = 0; + handle_t *handle; + + vfs_dq_init(inode); +retry: + handle = ext4_journal_start(inode, credits); + if (IS_ERR(handle)) { + ret = PTR_ERR(handle); + ext4_std_error(inode->i_sb, ret); + goto out; + } + /* Inode may not have project_id xattr yet. Create it explicitly */ + ret = ext4_prjid_write(handle, inode, EXT4_I(inode)->i_mid, + XATTR_CREATE); + if (ret == -EEXIST) + ret = 0; + if (ret) { + ret2 = ext4_journal_stop(handle); + if (ret2) + ret = ret2; + if (ret == -ENOSPC && + ext4_should_retry_alloc(inode->i_sb, &retries)) + goto retry; + } +#ifdef CONFIG_QUOTA + qid[PRJQUOTA] = new_mid; + if (inode->i_sb->dq_op->transfer(inode, qid, 1 << PRJQUOTA)) + ret = -EDQUOT; +#endif + ret = ext4_prjid_write(handle, inode, new_mid, XATTR_REPLACE); + if (ret) { + /* + * Function may fail only due to fatal error, Nor than less + * we have try to rollback quota changes. + */ +#ifdef CONFIG_QUOTA + qid[PRJQUOTA] = EXT4_I(inode)->i_mid; + if (inode->i_sb->dq_op->transfer(inode, qid, 1 << PRJQUOTA)) + ret = -EDQUOT; +#endif + ext4_std_error(inode->i_sb, ret); + + } + EXT4_I(inode)->i_mid = new_mid; + ret2 = ext4_journal_stop(handle); +out: + if (ret2) + ret = ret2; + return ret; +} +static size_t +ext4_xattr_prjid_list(struct dentry *dentry, char *list, size_t list_size, + const char *name, size_t name_len, int type) +{ + if (list && XATTR_PRJID_LEN <= list_size) + memcpy(list, XATTR_PRJID, XATTR_PRJID_LEN); + return XATTR_PRJID_LEN; + +} + +static int +ext4_xattr_prjid_get(struct dentry *dentry, const char *name, + void *buffer, size_t size, int type) +{ + int ret; + unsigned mid; + char buf[32]; + if (strcmp(name, "") != 0) + return -EINVAL; + ret = ext4_prjid_read(dentry->d_inode, &mid); + if (ret) + return ret; + snprintf(buf, sizeof(buf)-1, "%u", mid); + buf[31] = '\0'; + strncpy(buffer, buf, size); + return strlen(buf); +} + +static int +ext4_xattr_prjid_set(struct dentry *dentry, const char *name, + const void *value, size_t size, int flags, int type) +{ + unsigned int new_mid; + if (strcmp(name, "") != 0) + return -EINVAL; + new_mid = simple_strtoul(value, (char **)&value, 0); + return ext4_prjid_change(dentry->d_inode, new_mid); +} + +struct xattr_handler ext4_xattr_prjid_handler = { + .prefix = XATTR_PRJID, + .list = ext4_xattr_prjid_list, + .get = ext4_xattr_prjid_get, + .set = ext4_xattr_prjid_set, +}; -- 1.6.6 ^ permalink raw reply related [flat|nested] 13+ messages in thread
end of thread, other threads:[~2010-02-20 10:58 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-02-18 16:45 [PATCH 0/6] RFC: introduce extended inode owner identifier v4 Dmitry Monakhov 2010-02-18 16:45 ` [PATCH 1/6] vfs: add per-sb auxiliary inode attribute table Dmitry Monakhov 2010-02-18 16:45 ` [PATCH 2/6] quota: switch reservation space management to aux_attribute Dmitry Monakhov 2010-02-18 16:45 ` [PATCH 3/6] vfs: Add additional owner identifier Dmitry Monakhov 2010-02-18 16:45 ` [PATCH 4/6] quota: Implement metagroup support for quota Dmitry Monakhov 2010-02-18 16:45 ` [PATCH 5/6] ext4: enlarge mount option field Dmitry Monakhov 2010-02-18 16:45 ` [PATCH 6/6] ext4: Implement metagroup support for ext4 filesystem Dmitry Monakhov 2010-02-18 19:00 ` [PATCH 1/6] vfs: add per-sb auxiliary inode attribute table Brad Boyer 2010-02-18 19:34 ` Dmitry Monakhov 2010-02-18 23:31 ` [PATCH 0/6] RFC: introduce extended inode owner identifier v4 Dave Chinner 2010-02-19 10:16 ` Dmitry Monakhov 2010-02-19 23:31 ` Dave Chinner 2010-02-20 10:58 ` Dmitry Monakhov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).