[RFC PATCH 0/3] ceph: kernel client cephfs quota support

All of lore.kernel.org
 help / color / mirror / Atom feed

* [RFC PATCH 0/3] ceph: kernel client cephfs quota support
@ 2017-09-06 14:12 Luis Henriques
  2017-09-06 14:12 ` [RFC PATCH 1/3] ceph: quota: add initial infrastructure to support cephfs quotas Luis Henriques
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Luis Henriques @ 2017-09-06 14:12 UTC (permalink / raw)
  To: Yan, Zheng, Sage Weil, Ilya Dryomov, Jan Fajerski
  Cc: ceph-devel, Luis Henriques

A cephfs-specific quota implementation has been available in the
user-space fuse client for a while.  This quota implementation allows an
administrator to restrict the number of bytes and/or the number of files
in a filesystem subtree.  This quota implementation, however, is supported
at the client-level only, which means that cooperation is required between
different clients accessing the system.

This obviously assumes that all clients are trusted entities and will
respect the quotas, preventing users from exceeding the quota limits.
Since the kernel client doesn't support quotas, it has not been possible
to use it in a cluster where quotas are a requirement.

This patchset is an RFC that adds kernel client support for cephfs quotas
as it is currently implemented in the ceph fuse client.  Note however that
this patchset is not yet feature complete, as it only implements the
max_files quota (max_bytes is still missing).  I just wanted to have some
early review before continuing, specially in the reverse path walk code,
as this seems to be the perfect place to fail ;-)

I've obviously done some basic testing on this patchset but nothing really
complex -- single client on small (VMs) cluster.  Jan (thanks a lot!) has
pushed a branch that should enable testing using teuthology, but
unfortunately we haven't tried it:

  https://github.com/jan--f/ceph/tree/wip-quota-kernel-testing

[ Note that the goal is to simply use the existing workunit quota.sh once
  the kernel client has support for both max_files and max_bytes. ]

Luis Henriques (3):
  ceph: quota: add initial infrastructure to support cephfs quotas
  ceph: quotas: support for ceph.quota.max_files
  ceph: quota: don't allow cross-quota renames

 fs/ceph/Makefile                   |   2 +-
 fs/ceph/dir.c                      |  15 ++++
 fs/ceph/file.c                     |   4 +-
 fs/ceph/inode.c                    |   6 ++
 fs/ceph/mds_client.c               |  21 ++++++
 fs/ceph/mds_client.h               |   2 +
 fs/ceph/quota.c                    | 136 +++++++++++++++++++++++++++++++++++++
 fs/ceph/super.h                    |  10 +++
 fs/ceph/xattr.c                    |  45 ++++++++++++
 include/linux/ceph/ceph_features.h |   3 +-
 include/linux/ceph/ceph_fs.h       |  17 +++++
 11 files changed, 258 insertions(+), 3 deletions(-)
 create mode 100644 fs/ceph/quota.c

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [RFC PATCH 1/3] ceph: quota: add initial infrastructure to support cephfs quotas
  2017-09-06 14:12 [RFC PATCH 0/3] ceph: kernel client cephfs quota support Luis Henriques
@ 2017-09-06 14:12 ` Luis Henriques
  2017-09-06 14:12 ` [RFC PATCH 2/3] ceph: quotas: support for ceph.quota.max_files Luis Henriques
  2017-09-06 14:12 ` [RFC PATCH 3/3] ceph: quota: don't allow cross-quota renames Luis Henriques
  2 siblings, 0 replies; 7+ messages in thread
From: Luis Henriques @ 2017-09-06 14:12 UTC (permalink / raw)
  To: Yan, Zheng, Sage Weil, Ilya Dryomov, Jan Fajerski
  Cc: ceph-devel, Luis Henriques

This patch adds the infrastructure required to support cephfs quotas as it
is currently implemented in the ceph fuse client.  Cephfs quotas can be
set on any directory, and can restrict the number of bytes or the number
of files stored beneath that point in the directory hierarchy.

Quotas are set using the extended attributes 'ceph.quota.max_files' and
'ceph.quota.max_bytes', and can be removed by setting these attributes to
'0'.

Signed-off-by: Luis Henriques <lhenriques@suse.com>
---
 fs/ceph/Makefile                   |  2 +-
 fs/ceph/inode.c                    |  6 ++++
 fs/ceph/mds_client.c               | 21 ++++++++++++++
 fs/ceph/mds_client.h               |  2 ++
 fs/ceph/quota.c                    | 59 ++++++++++++++++++++++++++++++++++++++
 fs/ceph/super.h                    |  8 ++++++
 fs/ceph/xattr.c                    | 45 +++++++++++++++++++++++++++++
 include/linux/ceph/ceph_features.h |  3 +-
 include/linux/ceph/ceph_fs.h       | 17 +++++++++++
 9 files changed, 161 insertions(+), 2 deletions(-)
 create mode 100644 fs/ceph/quota.c

diff --git a/fs/ceph/Makefile b/fs/ceph/Makefile
index 85a4230b9bff..2e9a95615143 100644
--- a/fs/ceph/Makefile
+++ b/fs/ceph/Makefile
@@ -5,7 +5,7 @@
 obj-$(CONFIG_CEPH_FS) += ceph.o
 
 ceph-y := super.o inode.o dir.o file.o locks.o addr.o ioctl.o \
-	export.o caps.o snap.o xattr.o \
+	export.o caps.o snap.o xattr.o quota.o \
 	mds_client.o mdsmap.o strings.o ceph_frag.o \
 	debugfs.o
 
diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
index 220dfd87cbfa..e1bbce9a0b0a 100644
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -443,6 +443,9 @@ struct inode *ceph_alloc_inode(struct super_block *sb)
 	atomic64_set(&ci->i_complete_seq[1], 0);
 	ci->i_symlink = NULL;
 
+	ci->i_max_bytes = 0;
+	ci->i_max_files = 0;
+
 	memset(&ci->i_dir_layout, 0, sizeof(ci->i_dir_layout));
 	RCU_INIT_POINTER(ci->i_layout.pool_ns, NULL);
 
@@ -792,6 +795,9 @@ static int fill_inode(struct inode *inode, struct page *locked_page,
 	inode->i_rdev = le32_to_cpu(info->rdev);
 	inode->i_blkbits = fls(le32_to_cpu(info->layout.fl_stripe_unit)) - 1;
 
+	ci->i_max_bytes = iinfo->max_bytes;
+	ci->i_max_files = iinfo->max_files;
+
 	if ((new_version || (new_issued & CEPH_CAP_AUTH_SHARED)) &&
 	    (issued & CEPH_CAP_AUTH_EXCL) == 0) {
 		inode->i_mode = le32_to_cpu(info->mode);
diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index 666a9f274832..95220d37d9c4 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -100,6 +100,24 @@ static int parse_reply_info_in(void **p, void *end,
 	} else
 		info->inline_version = CEPH_INLINE_NONE;
 
+	if (features & CEPH_FEATURE_MDS_QUOTA) {
+		u8 struct_v, struct_compat;
+		u32 struct_len;
+
+		/* both struct_v and struct_compat are expected to be >= 1 */
+		ceph_decode_8_safe(p, end, struct_v, bad);
+		ceph_decode_8_safe(p, end, struct_compat, bad);
+		if (!struct_v || !struct_compat)
+			goto bad;
+		ceph_decode_32_safe(p, end, struct_len, bad);
+		ceph_decode_need(p, end, struct_len, bad);
+		ceph_decode_64_safe(p, end, info->max_bytes, bad);
+		ceph_decode_64_safe(p, end, info->max_files, bad);
+	} else {
+		info->max_bytes = 0;
+		info->max_files = 0;
+	}
+
 	info->pool_ns_len = 0;
 	info->pool_ns_data = NULL;
 	if (features & CEPH_FEATURE_FS_FILE_LAYOUT_V2) {
@@ -3990,6 +4008,9 @@ static void dispatch(struct ceph_connection *con, struct ceph_msg *msg)
 	case CEPH_MSG_CLIENT_LEASE:
 		handle_lease(mdsc, s, msg);
 		break;
+	case CEPH_MSG_CLIENT_QUOTA:
+		ceph_handle_quota(mdsc, s, msg);
+		break;
 
 	default:
 		pr_err("received unknown message type %d %s\n", type,
diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h
index db57ae98ed34..ca75624317a9 100644
--- a/fs/ceph/mds_client.h
+++ b/fs/ceph/mds_client.h
@@ -47,6 +47,8 @@ struct ceph_mds_reply_info_in {
 	char *inline_data;
 	u32 pool_ns_len;
 	char *pool_ns_data;
+	u64 max_bytes;
+	u64 max_files;
 };
 
 struct ceph_mds_reply_dir_entry {
diff --git a/fs/ceph/quota.c b/fs/ceph/quota.c
new file mode 100644
index 000000000000..c02d73a8d167
--- /dev/null
+++ b/fs/ceph/quota.c
@@ -0,0 +1,59 @@
+/*
+ * quota.c - CephFS quota
+ *
+ * Copyright (C) 2017 SUSE
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "super.h"
+#include "mds_client.h"
+
+void ceph_handle_quota(struct ceph_mds_client *mdsc,
+		       struct ceph_mds_session *session,
+		       struct ceph_msg *msg)
+{
+	struct super_block *sb = mdsc->fsc->sb;
+	struct ceph_mds_quota *h = msg->front.iov_base;
+	struct ceph_vino vino;
+	struct inode *inode;
+	struct ceph_inode_info *ci;
+
+	if (msg->front.iov_len != sizeof(*h)) {
+		pr_err("%s corrupt message mds%d len %d\n", __func__,
+		       session->s_mds, (int)msg->front.iov_len);
+		ceph_msg_dump(msg);
+		return;
+	}
+
+	/* lookup inode */
+	vino.ino = le64_to_cpu(h->ino);
+	vino.snap = CEPH_NOSNAP;
+	inode = ceph_find_inode(sb, vino);
+	ci = ceph_inode(inode);
+
+	mutex_lock(&session->s_mutex);
+	session->s_seq++;
+	mutex_unlock(&session->s_mutex);
+
+	spin_lock(&ci->i_ceph_lock);
+	ci->i_rbytes = le64_to_cpu(h->rbytes);
+	ci->i_rfiles = le64_to_cpu(h->rfiles);
+	ci->i_rsubdirs = le64_to_cpu(h->rsubdirs);
+	ci->i_max_bytes = le64_to_cpu(h->max_bytes);
+	ci->i_max_files = le64_to_cpu(h->max_files);
+	spin_unlock(&ci->i_ceph_lock);
+
+	iput(inode);
+}
diff --git a/fs/ceph/super.h b/fs/ceph/super.h
index f02a2225fe42..50c96ea7dc7c 100644
--- a/fs/ceph/super.h
+++ b/fs/ceph/super.h
@@ -296,6 +296,9 @@ struct ceph_inode_info {
 	u64 i_rbytes, i_rfiles, i_rsubdirs;
 	u64 i_files, i_subdirs;
 
+	/* quotas */
+	u64 i_max_bytes, i_max_files;
+
 	struct rb_root i_fragtree;
 	int i_fragtree_nsplits;
 	struct mutex i_fragtree_mutex;
@@ -1004,4 +1007,9 @@ extern int lock_to_ceph_filelock(struct file_lock *fl, struct ceph_filelock *c);
 extern int ceph_fs_debugfs_init(struct ceph_fs_client *client);
 extern void ceph_fs_debugfs_cleanup(struct ceph_fs_client *client);
 
+/* quota.c */
+extern void ceph_handle_quota(struct ceph_mds_client *mdsc,
+			      struct ceph_mds_session *session,
+			      struct ceph_msg *msg);
+
 #endif /* _FS_CEPH_SUPER_H */
diff --git a/fs/ceph/xattr.c b/fs/ceph/xattr.c
index 11263f102e4c..96ba4bbd35ea 100644
--- a/fs/ceph/xattr.c
+++ b/fs/ceph/xattr.c
@@ -223,6 +223,32 @@ static size_t ceph_vxattrcb_dir_rctime(struct ceph_inode_info *ci, char *val,
 			(long)ci->i_rctime.tv_nsec);
 }
 
+/* quotas */
+
+static bool ceph_vxattrcb_quota_exists(struct ceph_inode_info *ci)
+{
+	return (ci->i_max_files || ci->i_max_bytes);
+}
+
+static size_t ceph_vxattrcb_quota(struct ceph_inode_info *ci, char *val,
+				  size_t size)
+{
+	return snprintf(val, size, "max_bytes=%llu max_files=%llu",
+			ci->i_max_bytes,
+			ci->i_max_files);
+}
+
+static size_t ceph_vxattrcb_quota_max_bytes(struct ceph_inode_info *ci,
+					    char *val, size_t size)
+{
+	return snprintf(val, size, "%llu", ci->i_max_bytes);
+}
+
+static size_t ceph_vxattrcb_quota_max_files(struct ceph_inode_info *ci,
+					    char *val, size_t size)
+{
+	return snprintf(val, size, "%llu", ci->i_max_files);
+}
 
 #define CEPH_XATTR_NAME(_type, _name)	XATTR_CEPH_PREFIX #_type "." #_name
 #define CEPH_XATTR_NAME2(_type, _name, _name2)	\
@@ -246,6 +272,15 @@ static size_t ceph_vxattrcb_dir_rctime(struct ceph_inode_info *ci, char *val,
 		.hidden = true,			\
 		.exists_cb = ceph_vxattrcb_layout_exists,	\
 	}
+#define XATTR_QUOTA_FIELD(_type, _name)					\
+	{								\
+		.name = CEPH_XATTR_NAME(_type, _name),			\
+		.name_size = sizeof(CEPH_XATTR_NAME(_type, _name)),	\
+		.getxattr_cb = ceph_vxattrcb_ ## _type ## _ ## _name,	\
+		.readonly = false,					\
+		.hidden = true,						\
+		.exists_cb = ceph_vxattrcb_quota_exists,		\
+	}
 
 static struct ceph_vxattr ceph_dir_vxattrs[] = {
 	{
@@ -269,6 +304,16 @@ static struct ceph_vxattr ceph_dir_vxattrs[] = {
 	XATTR_NAME_CEPH(dir, rsubdirs),
 	XATTR_NAME_CEPH(dir, rbytes),
 	XATTR_NAME_CEPH(dir, rctime),
+	{
+		.name = "ceph.quota",
+		.name_size = sizeof("ceph.quota"),
+		.getxattr_cb = ceph_vxattrcb_quota,
+		.readonly = false,
+		.hidden = true,
+		.exists_cb = ceph_vxattrcb_quota_exists,
+	},
+	XATTR_QUOTA_FIELD(quota, max_bytes),
+	XATTR_QUOTA_FIELD(quota, max_files),
 	{ .name = NULL, 0 }	/* Required table terminator */
 };
 static size_t ceph_dir_vxattrs_name_size;	/* total size of all names */
diff --git a/include/linux/ceph/ceph_features.h b/include/linux/ceph/ceph_features.h
index 040dd105c3e7..6ac377dd18dc 100644
--- a/include/linux/ceph/ceph_features.h
+++ b/include/linux/ceph/ceph_features.h
@@ -208,7 +208,8 @@ DEFINE_CEPH_FEATURE_DEPRECATED(63, 1, RESERVED_BROKEN, LUMINOUS) // client-facin
 	 CEPH_FEATURE_SERVER_JEWEL |		\
 	 CEPH_FEATURE_MON_STATEFUL_SUB |	\
 	 CEPH_FEATURE_CRUSH_TUNABLES5 |		\
-	 CEPH_FEATURE_NEW_OSDOPREPLY_ENCODING)
+	 CEPH_FEATURE_NEW_OSDOPREPLY_ENCODING |	\
+	 CEPH_FEATURE_MDS_QUOTA)
 
 #define CEPH_FEATURES_REQUIRED_DEFAULT   \
 	(CEPH_FEATURE_NOSRCADDR |	 \
diff --git a/include/linux/ceph/ceph_fs.h b/include/linux/ceph/ceph_fs.h
index edf5b04b918a..ee415add6ced 100644
--- a/include/linux/ceph/ceph_fs.h
+++ b/include/linux/ceph/ceph_fs.h
@@ -133,6 +133,7 @@ struct ceph_dir_layout {
 #define CEPH_MSG_CLIENT_LEASE           0x311
 #define CEPH_MSG_CLIENT_SNAP            0x312
 #define CEPH_MSG_CLIENT_CAPRELEASE      0x313
+#define CEPH_MSG_CLIENT_QUOTA		0x314
 
 /* pool ops */
 #define CEPH_MSG_POOLOP_REPLY           48
@@ -802,4 +803,20 @@ struct ceph_mds_snap_realm {
 } __attribute__ ((packed));
 /* followed by my snap list, then prior parent snap list */
 
+/*
+ * quotas
+ */
+struct ceph_mds_quota {
+	__le64 ino;		/* ino */
+	struct ceph_timespec rctime;
+	__le64 rbytes;		/* dir stats */
+	__le64 rfiles;
+	__le64 rsubdirs;
+	__u8 struct_v;		/* compat */
+	__u8 struct_compat;
+	__le32 struct_len;
+	__le64 max_bytes;	/* quota max. bytes */
+	__le64 max_files;	/* quota max. files */
+} __attribute__ ((packed));
+
 #endif

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [RFC PATCH 2/3] ceph: quotas: support for ceph.quota.max_files
  2017-09-06 14:12 [RFC PATCH 0/3] ceph: kernel client cephfs quota support Luis Henriques
  2017-09-06 14:12 ` [RFC PATCH 1/3] ceph: quota: add initial infrastructure to support cephfs quotas Luis Henriques
@ 2017-09-06 14:12 ` Luis Henriques
  2017-09-07 14:22   ` Yan, Zheng
  2017-09-06 14:12 ` [RFC PATCH 3/3] ceph: quota: don't allow cross-quota renames Luis Henriques
  2 siblings, 1 reply; 7+ messages in thread
From: Luis Henriques @ 2017-09-06 14:12 UTC (permalink / raw)
  To: Yan, Zheng, Sage Weil, Ilya Dryomov, Jan Fajerski
  Cc: ceph-devel, Luis Henriques

This patch adds support for the max_files quota.  It hooks into all the
ceph functions that add new filesystem objects that need to be checked
against the quota limit.  -EDQUOT is returned when this limit is hit.

Note that we're not checking quotas on ceph_link().  ceph_link doesn't
really create a new inode,  and since the MDS doesn't update the directory
statistics when a new (hard) link is created (only with symlinks), they
are not accounted as a new file.

Signed-off-by: Luis Henriques <lhenriques@suse.com>
---
 fs/ceph/dir.c   | 11 +++++++++++
 fs/ceph/file.c  |  4 +++-
 fs/ceph/quota.c | 42 ++++++++++++++++++++++++++++++++++++++++++
 fs/ceph/super.h |  1 +
 4 files changed, 57 insertions(+), 1 deletion(-)

diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c
index ef7240ace576..fb6adcf0ff51 100644
--- a/fs/ceph/dir.c
+++ b/fs/ceph/dir.c
@@ -815,6 +815,9 @@ static int ceph_mknod(struct inode *dir, struct dentry *dentry,
 	if (ceph_snap(dir) != CEPH_NOSNAP)
 		return -EROFS;
 
+	if (ceph_quota_is_quota_files_exceeded(dir))
+		return -EDQUOT;
+
 	err = ceph_pre_init_acls(dir, &mode, &acls);
 	if (err < 0)
 		return err;
@@ -868,6 +871,9 @@ static int ceph_symlink(struct inode *dir, struct dentry *dentry,
 	if (ceph_snap(dir) != CEPH_NOSNAP)
 		return -EROFS;
 
+	if (ceph_quota_is_quota_files_exceeded(dir))
+		return -EDQUOT;
+
 	dout("symlink in dir %p dentry %p to '%s'\n", dir, dentry, dest);
 	req = ceph_mdsc_create_request(mdsc, CEPH_MDS_OP_SYMLINK, USE_AUTH_MDS);
 	if (IS_ERR(req)) {
@@ -917,6 +923,11 @@ static int ceph_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode)
 		goto out;
 	}
 
+	if (ceph_quota_is_quota_files_exceeded(dir)) {
+		err = -EDQUOT;
+		goto out;
+	}
+
 	mode |= S_IFDIR;
 	err = ceph_pre_init_acls(dir, &mode, &acls);
 	if (err < 0)
diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index 3d48c415f3cb..708a9b841382 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -370,7 +370,7 @@ int ceph_atomic_open(struct inode *dir, struct dentry *dentry,
 	struct ceph_mds_request *req;
 	struct dentry *dn;
 	struct ceph_acls_info acls = {};
-       int mask;
+	int mask;
 	int err;
 
 	dout("atomic_open %p dentry %p '%pd' %s flags %d mode 0%o\n",
@@ -381,6 +381,8 @@ int ceph_atomic_open(struct inode *dir, struct dentry *dentry,
 		return -ENAMETOOLONG;
 
 	if (flags & O_CREAT) {
+		if (ceph_quota_is_quota_files_exceeded(dir))
+			return -EDQUOT;
 		err = ceph_pre_init_acls(dir, &mode, &acls);
 		if (err < 0)
 			return err;
diff --git a/fs/ceph/quota.c b/fs/ceph/quota.c
index c02d73a8d167..1bd02658f16a 100644
--- a/fs/ceph/quota.c
+++ b/fs/ceph/quota.c
@@ -57,3 +57,45 @@ void ceph_handle_quota(struct ceph_mds_client *mdsc,
 
 	iput(inode);
 }
+
+bool ceph_quota_is_quota_files_exceeded(struct inode *inode)
+{
+	struct ceph_inode_info *ci;
+	struct dentry *next, *parent;
+	u64 max_files;
+	u64 rentries = 0;
+	unsigned seq;
+	bool result = false;
+
+	WARN_ON(!S_ISDIR(inode->i_mode));
+
+retry:
+	seq = read_seqbegin(&rename_lock);
+	ci = ceph_inode(inode);
+	next = d_find_any_alias(inode);
+
+	while (true) {
+		spin_lock(&ci->i_ceph_lock);
+		max_files = ci->i_max_files;
+		rentries = ci->i_rfiles + ci->i_rsubdirs;
+		spin_unlock(&ci->i_ceph_lock);
+
+		if ((max_files && (rentries >= max_files)) || IS_ROOT(next))
+			break;
+
+		parent = dget_parent(next);
+		ci = ceph_inode(d_inode(parent));
+		dput(next);
+		next = parent;
+	}
+
+	dput(next);
+
+	if (read_seqretry(&rename_lock, seq))
+		goto retry;
+
+	if (max_files && (rentries >= max_files))
+		result = true;
+
+	return result;
+}
diff --git a/fs/ceph/super.h b/fs/ceph/super.h
index 50c96ea7dc7c..ef131107dbf6 100644
--- a/fs/ceph/super.h
+++ b/fs/ceph/super.h
@@ -1011,5 +1011,6 @@ extern void ceph_fs_debugfs_cleanup(struct ceph_fs_client *client);
 extern void ceph_handle_quota(struct ceph_mds_client *mdsc,
 			      struct ceph_mds_session *session,
 			      struct ceph_msg *msg);
+extern bool ceph_quota_is_quota_files_exceeded(struct inode *inode);
 
 #endif /* _FS_CEPH_SUPER_H */

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [RFC PATCH 3/3] ceph: quota: don't allow cross-quota renames
  2017-09-06 14:12 [RFC PATCH 0/3] ceph: kernel client cephfs quota support Luis Henriques
  2017-09-06 14:12 ` [RFC PATCH 1/3] ceph: quota: add initial infrastructure to support cephfs quotas Luis Henriques
  2017-09-06 14:12 ` [RFC PATCH 2/3] ceph: quotas: support for ceph.quota.max_files Luis Henriques
@ 2017-09-06 14:12 ` Luis Henriques
  2 siblings, 0 replies; 7+ messages in thread
From: Luis Henriques @ 2017-09-06 14:12 UTC (permalink / raw)
  To: Yan, Zheng, Sage Weil, Ilya Dryomov, Jan Fajerski
  Cc: ceph-devel, Luis Henriques

This patch changes ceph_rename so that -EXDEV is returned if an attempt is
made to mv a file between two different dir trees with different quotas
setup.

Signed-off-by: Luis Henriques <lhenriques@suse.com>
---
 fs/ceph/dir.c   |  4 ++++
 fs/ceph/quota.c | 35 +++++++++++++++++++++++++++++++++++
 fs/ceph/super.h |  1 +
 3 files changed, 40 insertions(+)

diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c
index fb6adcf0ff51..dba96e227b43 100644
--- a/fs/ceph/dir.c
+++ b/fs/ceph/dir.c
@@ -1087,6 +1087,10 @@ static int ceph_rename(struct inode *old_dir, struct dentry *old_dentry,
 		else
 			return -EROFS;
 	}
+	/* don't allow cross-quota renames */
+	if ((old_dir != new_dir) && (!ceph_quota_is_same_root(old_dir, new_dir)))
+		return -EXDEV;
+
 	dout("rename dir %p dentry %p to dir %p dentry %p\n",
 	     old_dir, old_dentry, new_dir, new_dentry);
 	req = ceph_mdsc_create_request(mdsc, op, USE_AUTH_MDS);
diff --git a/fs/ceph/quota.c b/fs/ceph/quota.c
index 1bd02658f16a..80d5231a0905 100644
--- a/fs/ceph/quota.c
+++ b/fs/ceph/quota.c
@@ -20,6 +20,11 @@
 #include "super.h"
 #include "mds_client.h"
 
+static inline bool ceph_has_quota(struct ceph_inode_info *ci)
+{
+	return (ci->i_max_files || ci->i_max_bytes);
+}
+
 void ceph_handle_quota(struct ceph_mds_client *mdsc,
 		       struct ceph_mds_session *session,
 		       struct ceph_msg *msg)
@@ -58,6 +63,36 @@ void ceph_handle_quota(struct ceph_mds_client *mdsc,
 	iput(inode);
 }
 
+static struct ceph_inode_info *get_quota_root(struct inode *inode)
+{
+	struct ceph_inode_info *ci;
+	struct dentry *next, *parent;
+
+	next = d_find_any_alias(inode);
+	ci = ceph_inode(d_inode(next));
+	while (!ceph_has_quota(ci) && !IS_ROOT(next)) {
+		parent = dget_parent(next);
+		dput(next);
+		next = parent;
+		ci = ceph_inode(d_inode(next));
+	}
+
+	dput(next);
+	if (ceph_has_quota(ci))
+		return ci;
+	return NULL;
+}
+
+bool ceph_quota_is_same_root(struct inode *old, struct inode *new)
+{
+	struct ceph_inode_info *ci_old, *ci_new;
+
+	ci_old = get_quota_root(old);
+	ci_new = get_quota_root(new);
+
+	return (ci_old == ci_new);
+}
+
 bool ceph_quota_is_quota_files_exceeded(struct inode *inode)
 {
 	struct ceph_inode_info *ci;
diff --git a/fs/ceph/super.h b/fs/ceph/super.h
index ef131107dbf6..5e4f23ab556f 100644
--- a/fs/ceph/super.h
+++ b/fs/ceph/super.h
@@ -1012,5 +1012,6 @@ extern void ceph_handle_quota(struct ceph_mds_client *mdsc,
 			      struct ceph_mds_session *session,
 			      struct ceph_msg *msg);
 extern bool ceph_quota_is_quota_files_exceeded(struct inode *inode);
+extern bool ceph_quota_is_same_root(struct inode *old, struct inode *new);
 
 #endif /* _FS_CEPH_SUPER_H */

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH 2/3] ceph: quotas: support for ceph.quota.max_files
  2017-09-06 14:12 ` [RFC PATCH 2/3] ceph: quotas: support for ceph.quota.max_files Luis Henriques
@ 2017-09-07 14:22   ` Yan, Zheng
  2017-09-08  8:33     ` Luis Henriques
  0 siblings, 1 reply; 7+ messages in thread
From: Yan, Zheng @ 2017-09-07 14:22 UTC (permalink / raw)
  To: Luis Henriques
  Cc: Sage Weil, Ilya Dryomov, Jan Fajerski, Ceph Development,
	Jeff Layton


> On 6 Sep 2017, at 22:12, Luis Henriques <lhenriques@suse.com> wrote:
> 
> This patch adds support for the max_files quota.  It hooks into all the
> ceph functions that add new filesystem objects that need to be checked
> against the quota limit.  -EDQUOT is returned when this limit is hit.
> 
> Note that we're not checking quotas on ceph_link().  ceph_link doesn't
> really create a new inode,  and since the MDS doesn't update the directory
> statistics when a new (hard) link is created (only with symlinks), they
> are not accounted as a new file.
> 
> Signed-off-by: Luis Henriques <lhenriques@suse.com>
> ---
> fs/ceph/dir.c   | 11 +++++++++++
> fs/ceph/file.c  |  4 +++-
> fs/ceph/quota.c | 42 ++++++++++++++++++++++++++++++++++++++++++
> fs/ceph/super.h |  1 +
> 4 files changed, 57 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c
> index ef7240ace576..fb6adcf0ff51 100644
> --- a/fs/ceph/dir.c
> +++ b/fs/ceph/dir.c
> @@ -815,6 +815,9 @@ static int ceph_mknod(struct inode *dir, struct dentry *dentry,
> 	if (ceph_snap(dir) != CEPH_NOSNAP)
> 		return -EROFS;
> 
> +	if (ceph_quota_is_quota_files_exceeded(dir))
> +		return -EDQUOT;
> +
> 	err = ceph_pre_init_acls(dir, &mode, &acls);
> 	if (err < 0)
> 		return err;
> @@ -868,6 +871,9 @@ static int ceph_symlink(struct inode *dir, struct dentry *dentry,
> 	if (ceph_snap(dir) != CEPH_NOSNAP)
> 		return -EROFS;
> 
> +	if (ceph_quota_is_quota_files_exceeded(dir))
> +		return -EDQUOT;
> +
> 	dout("symlink in dir %p dentry %p to '%s'\n", dir, dentry, dest);
> 	req = ceph_mdsc_create_request(mdsc, CEPH_MDS_OP_SYMLINK, USE_AUTH_MDS);
> 	if (IS_ERR(req)) {
> @@ -917,6 +923,11 @@ static int ceph_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode)
> 		goto out;
> 	}
> 
> +	if (ceph_quota_is_quota_files_exceeded(dir)) {
> +		err = -EDQUOT;
> +		goto out;
> +	}
> +
> 	mode |= S_IFDIR;
> 	err = ceph_pre_init_acls(dir, &mode, &acls);
> 	if (err < 0)
> diff --git a/fs/ceph/file.c b/fs/ceph/file.c
> index 3d48c415f3cb..708a9b841382 100644
> --- a/fs/ceph/file.c
> +++ b/fs/ceph/file.c
> @@ -370,7 +370,7 @@ int ceph_atomic_open(struct inode *dir, struct dentry *dentry,
> 	struct ceph_mds_request *req;
> 	struct dentry *dn;
> 	struct ceph_acls_info acls = {};
> -       int mask;
> +	int mask;
> 	int err;
> 
> 	dout("atomic_open %p dentry %p '%pd' %s flags %d mode 0%o\n",
> @@ -381,6 +381,8 @@ int ceph_atomic_open(struct inode *dir, struct dentry *dentry,
> 		return -ENAMETOOLONG;
> 
> 	if (flags & O_CREAT) {
> +		if (ceph_quota_is_quota_files_exceeded(dir))
> +			return -EDQUOT;
> 		err = ceph_pre_init_acls(dir, &mode, &acls);
> 		if (err < 0)
> 			return err;
> diff --git a/fs/ceph/quota.c b/fs/ceph/quota.c
> index c02d73a8d167..1bd02658f16a 100644
> --- a/fs/ceph/quota.c
> +++ b/fs/ceph/quota.c
> @@ -57,3 +57,45 @@ void ceph_handle_quota(struct ceph_mds_client *mdsc,
> 
> 	iput(inode);
> }
> +
> +bool ceph_quota_is_quota_files_exceeded(struct inode *inode)
> +{
> +	struct ceph_inode_info *ci;
> +	struct dentry *next, *parent;
> +	u64 max_files;
> +	u64 rentries = 0;
> +	unsigned seq;
> +	bool result = false;
> +
> +	WARN_ON(!S_ISDIR(inode->i_mode));
> +
> +retry:
> +	seq = read_seqbegin(&rename_lock);
> +	ci = ceph_inode(inode);
> +	next = d_find_any_alias(inode);
> +
> +	while (true) {
> +		spin_lock(&ci->i_ceph_lock);
> +		max_files = ci->i_max_files;
> +		rentries = ci->i_rfiles + ci->i_rsubdirs;
> +		spin_unlock(&ci->i_ceph_lock);
> +
> +		if ((max_files && (rentries >= max_files)) || IS_ROOT(next))
> +			break;
> +
> +		parent = dget_parent(next);
> +		ci = ceph_inode(d_inode(parent));
> +		dput(next);
> +		next = parent;
> +	}
> +
> +	dput(next);
> +
> +	if (read_seqretry(&rename_lock, seq))
> +		goto retry;
> +
> +	if (max_files && (rentries >= max_files))
> +		result = true;

This bottom-up dentry traversal code worries me. I vaguely remember that bottom-up
dentry traversal in kernel is discouraged. Then there are multiples clients modifying
the filesystem at the same time, the rename_lock does not help. That's why user space
code Client::get_quota_root() checks dentry lease and does lookup parent. I’m not sure
if we can do the same operations in kernel, because locking is much more complex in
kernel.

For the long term, I prefer unifying quota and snapshot implementation. The inode
trace in MClientReply contains information about which quota realm the inode belongs
to. So client can find quota information easily. (This requires bigger change for both
mds and client)

Regards
Yan, Zheng



> +
> +	return result;
> +}
> diff --git a/fs/ceph/super.h b/fs/ceph/super.h
> index 50c96ea7dc7c..ef131107dbf6 100644
> --- a/fs/ceph/super.h
> +++ b/fs/ceph/super.h
> @@ -1011,5 +1011,6 @@ extern void ceph_fs_debugfs_cleanup(struct ceph_fs_client *client);
> extern void ceph_handle_quota(struct ceph_mds_client *mdsc,
> 			      struct ceph_mds_session *session,
> 			      struct ceph_msg *msg);
> +extern bool ceph_quota_is_quota_files_exceeded(struct inode *inode);
> 
> #endif /* _FS_CEPH_SUPER_H */


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH 2/3] ceph: quotas: support for ceph.quota.max_files
  2017-09-07 14:22   ` Yan, Zheng
@ 2017-09-08  8:33     ` Luis Henriques
  2017-09-08 10:43       ` Yan, Zheng
  0 siblings, 1 reply; 7+ messages in thread
From: Luis Henriques @ 2017-09-08  8:33 UTC (permalink / raw)
  To: Yan, Zheng
  Cc: Sage Weil, Ilya Dryomov, Jan Fajerski, Ceph Development,
	Jeff Layton

"Yan, Zheng" <zyan@redhat.com> writes:

>> On 6 Sep 2017, at 22:12, Luis Henriques <lhenriques@suse.com> wrote:
<snip>
> This bottom-up dentry traversal code worries me. I vaguely remember that bottom-up
> dentry traversal in kernel is discouraged. Then there are multiples clients modifying
> the filesystem at the same time, the rename_lock does not help. That's why user space
> code Client::get_quota_root() checks dentry lease and does lookup parent. I’m not sure
> if we can do the same operations in kernel, because locking is much more complex in
> kernel.

So, you're saying that in addition to using the rename_lock (for local
renames), that loop will also need to do something similar to what's
being done already in function ceph_d_revalidate.  I.e., it needs to
validate the lease (as in function dentry_lease_is_valid) and send a
CEPH_MDS_OP_LOOKUP if a dentry is invalid.  Or am I missing something?

> For the long term, I prefer unifying quota and snapshot implementation. The inode
> trace in MClientReply contains information about which quota realm the inode belongs
> to. So client can find quota information easily. (This requires bigger change for both
> mds and client)

My motivation for trying to bring the kernel client a bit closer to
the fuse client was that there are currently valid use-cases for this
quota implementation, with all its limitations.

Now, I completely agree that ideally the core quota implementation
should be moved to the MDS.  This would simplify the clients side,
and, above all, would remove the limitation of requiring clients
cooperation.

Obviously, I would be more than happy to help on the kernel client
side of this solution.  But I'm afraid that the real hard work would
be on the MDS code, where things such as multi-MDS and dir
fragmentation would make this solution quite complex.

Cheers,
-- 
Luís

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH 2/3] ceph: quotas: support for ceph.quota.max_files
  2017-09-08  8:33     ` Luis Henriques
@ 2017-09-08 10:43       ` Yan, Zheng
  0 siblings, 0 replies; 7+ messages in thread
From: Yan, Zheng @ 2017-09-08 10:43 UTC (permalink / raw)
  To: Luis Henriques
  Cc: Sage Weil, Ilya Dryomov, Jan Fajerski, Ceph Development,
	Jeff Layton


> On 8 Sep 2017, at 16:33, Luis Henriques <lhenriques@suse.com> wrote:
> 
> "Yan, Zheng" <zyan@redhat.com> writes:
> 
>>> On 6 Sep 2017, at 22:12, Luis Henriques <lhenriques@suse.com> wrote:
> <snip>
>> This bottom-up dentry traversal code worries me. I vaguely remember that bottom-up
>> dentry traversal in kernel is discouraged. Then there are multiples clients modifying
>> the filesystem at the same time, the rename_lock does not help. That's why user space
>> code Client::get_quota_root() checks dentry lease and does lookup parent. I’m not sure
>> if we can do the same operations in kernel, because locking is much more complex in
>> kernel.
> 
> So, you're saying that in addition to using the rename_lock (for local
> renames), that loop will also need to do something similar to what's
> being done already in function ceph_d_revalidate.  I.e., it needs to
> validate the lease (as in function dentry_lease_is_valid) and send a
> CEPH_MDS_OP_LOOKUP if a dentry is invalid.  Or am I missing something?

yes

> 
>> For the long term, I prefer unifying quota and snapshot implementation. The inode
>> trace in MClientReply contains information about which quota realm the inode belongs
>> to. So client can find quota information easily. (This requires bigger change for both
>> mds and client)
> 
> My motivation for trying to bring the kernel client a bit closer to
> the fuse client was that there are currently valid use-cases for this
> quota implementation, with all its limitations.
> 
> Now, I completely agree that ideally the core quota implementation
> should be moved to the MDS.  This would simplify the clients side,
> and, above all, would remove the limitation of requiring clients
> cooperation.
> 
> Obviously, I would be more than happy to help on the kernel client
> side of this solution.  But I'm afraid that the real hard work would
> be on the MDS code, where things such as multi-MDS and dir
> fragmentation would make this solution quite complex.

I will start to hammer this code soon, please wait a few days.

Regards
Yan, Zheng

> 
> Cheers,
> -- 
> Luís


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-09-08 10:43 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-09-06 14:12 [RFC PATCH 0/3] ceph: kernel client cephfs quota support Luis Henriques
2017-09-06 14:12 ` [RFC PATCH 1/3] ceph: quota: add initial infrastructure to support cephfs quotas Luis Henriques
2017-09-06 14:12 ` [RFC PATCH 2/3] ceph: quotas: support for ceph.quota.max_files Luis Henriques
2017-09-07 14:22   ` Yan, Zheng
2017-09-08  8:33     ` Luis Henriques
2017-09-08 10:43       ` Yan, Zheng
2017-09-06 14:12 ` [RFC PATCH 3/3] ceph: quota: don't allow cross-quota renames Luis Henriques

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.