[RFC v2 PATCH 0/4] ceph: kernel client cephfs quota support

All of lore.kernel.org
 help / color / mirror / Atom feed

* [RFC v2 PATCH 0/4] ceph: kernel client cephfs quota support
@ 2017-12-18 15:38 Luis Henriques
  2017-12-18 15:38 ` [RFC v2 PATCH 1/4] ceph: add seqlock for snaprealm hierarchy change detection Luis Henriques
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Luis Henriques @ 2017-12-18 15:38 UTC (permalink / raw)
  To: ceph-devel; +Cc: Yan, Zheng, Jeff Layton, Jan Fajerski, Luis Henriques

A cephfs-specific quota implementation has been available in the
user-space fuse client for a while.  This quota implementation allows an
administrator to restrict the number of bytes and/or the number of files
in a filesystem subtree.  This quota implementation, however, is
supported at the client-level only, which means that cooperation is
required between different clients accessing the system.

This obviously assumes that all clients are trusted entities and will
respect the quotas, preventing users from exceeding the quota limits.
Since the kernel client doesn't support quotas, it has not been possible
to use it in a cluster where quotas are a requirement.

This patchset is an RFC that adds kernel client support for cephfs
quotas as it is currently implemented in the ceph fuse client.  Note
however that this patchset is not yet feature complete, as it only
implements the max_files quota (max_bytes is still missing).

** Changes since v1 **

Instead of trying to do a reverse path walk to find the "quota realm"
for a given directory, this patchset is now using snaprealms.  Thus, for
testing it, a modified MDS is required:

  https://github.com/ukernel/ceph/tree/wip-cephfs-quota-realm

This modified MDS creates a snaprealm when a quota is set in a
directory.  This means that a client needs only to walk up the snaprealm
hierarchy to find a directory that has quotas instead of doing the full
reverse path walking.

Note however that this requires an extra patch that adds a seqlock (1st
patch in series) to detect changes in the snaprealm hierarchy.

Luis Henriques (4):
  ceph: add seqlock for snaprealm hierarchy change detection
  ceph: quota: add initial infrastructure to support cephfs quotas
  ceph: quotas: support for ceph.quota.max_files
  ceph: quota: don't allow cross-quota renames

 fs/ceph/Makefile                   |   2 +-
 fs/ceph/dir.c                      |  16 +++
 fs/ceph/file.c                     |   4 +-
 fs/ceph/inode.c                    |   6 ++
 fs/ceph/mds_client.c               |  23 +++++
 fs/ceph/mds_client.h               |   2 +
 fs/ceph/quota.c                    | 197 +++++++++++++++++++++++++++++++++++++
 fs/ceph/snap.c                     |  45 +++++++--
 fs/ceph/super.h                    |  12 +++
 fs/ceph/xattr.c                    |  44 +++++++++
 include/linux/ceph/ceph_features.h |   3 +-
 include/linux/ceph/ceph_fs.h       |  17 ++++
 12 files changed, 362 insertions(+), 9 deletions(-)
 create mode 100644 fs/ceph/quota.c

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [RFC v2 PATCH 1/4] ceph: add seqlock for snaprealm hierarchy change detection
  2017-12-18 15:38 [RFC v2 PATCH 0/4] ceph: kernel client cephfs quota support Luis Henriques
@ 2017-12-18 15:38 ` Luis Henriques
  2017-12-19  9:22   ` Yan, Zheng
  2017-12-18 15:39 ` [RFC v2 PATCH 2/4] ceph: quota: add initial infrastructure to support cephfs quotas Luis Henriques
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 9+ messages in thread
From: Luis Henriques @ 2017-12-18 15:38 UTC (permalink / raw)
  To: ceph-devel; +Cc: Yan, Zheng, Jeff Layton, Jan Fajerski, Luis Henriques

It is possible to receive an update to the snaprealms hierarchy from an
MDS while walking through this hierarchy.  This patch adds a mechanism
similar to the one used in dcache to detect renames in lookups.  A new
seqlock is used to allow a retry in case a change has occurred while
walking through the snaprealms.

Link: http://tracker.ceph.com/issues/22372
Signed-off-by: Luis Henriques <lhenriques@suse.com>
---
 fs/ceph/snap.c  | 45 +++++++++++++++++++++++++++++++++++++++------
 fs/ceph/super.h |  2 ++
 2 files changed, 41 insertions(+), 6 deletions(-)

diff --git a/fs/ceph/snap.c b/fs/ceph/snap.c
index 8a2ca41e4b97..8b9d6c7c0df4 100644
--- a/fs/ceph/snap.c
+++ b/fs/ceph/snap.c
@@ -54,6 +54,25 @@
  * console).
  */
 
+/*
+ * While walking through the snaprealm hierarchy it is possible that
+ * this hierarchy is updated (for ex, when a different client moves
+ * directories around).  snaprealm_lock isn't supposed to prevent this
+ * but, just like the rename_lock in dcache, to detect that this has
+ * happen so that a lookup can be retried.
+ *
+ * Here's a typical usage pattern for this lock:
+ *
+ * retry:
+ * 	seq = read_seqbegin(&snaprealm_lock);
+ *	realm = ci->i_snap_realm;
+ *	ceph_get_snap_realm(mdsc, realm);
+ *	... do stuff ...
+ *	ceph_put_snap_realm(mdsc, realm);
+ *	if (read_seqretry(&snaprealm_lock, seq))
+ *		goto retry;
+ */
+DEFINE_SEQLOCK(snaprealm_lock);
 
 /*
  * increase ref count for the realm
@@ -81,10 +100,12 @@ void ceph_get_snap_realm(struct ceph_mds_client *mdsc,
 static void __insert_snap_realm(struct rb_root *root,
 				struct ceph_snap_realm *new)
 {
-	struct rb_node **p = &root->rb_node;
+	struct rb_node **p;
 	struct rb_node *parent = NULL;
 	struct ceph_snap_realm *r = NULL;
 
+	write_seqlock(&snaprealm_lock);
+	p  = &root->rb_node;
 	while (*p) {
 		parent = *p;
 		r = rb_entry(parent, struct ceph_snap_realm, node);
@@ -98,6 +119,7 @@ static void __insert_snap_realm(struct rb_root *root,
 
 	rb_link_node(&new->node, parent, p);
 	rb_insert_color(&new->node, root);
+	write_sequnlock(&snaprealm_lock);
 }
 
 /*
@@ -136,9 +158,14 @@ static struct ceph_snap_realm *ceph_create_snap_realm(
 static struct ceph_snap_realm *__lookup_snap_realm(struct ceph_mds_client *mdsc,
 						   u64 ino)
 {
-	struct rb_node *n = mdsc->snap_realms.rb_node;
-	struct ceph_snap_realm *r;
-
+	struct rb_node *n;
+	struct ceph_snap_realm *realm, *r;
+	unsigned seq;
+
+retry:
+	realm = NULL;
+	seq = read_seqbegin(&snaprealm_lock);
+	n = mdsc->snap_realms.rb_node;
 	while (n) {
 		r = rb_entry(n, struct ceph_snap_realm, node);
 		if (ino < r->ino)
@@ -147,10 +174,14 @@ static struct ceph_snap_realm *__lookup_snap_realm(struct ceph_mds_client *mdsc,
 			n = n->rb_right;
 		else {
 			dout("lookup_snap_realm %llx %p\n", r->ino, r);
-			return r;
+			realm = r;
+			break;
 		}
 	}
-	return NULL;
+
+	if (read_seqretry(&snaprealm_lock, seq))
+		goto retry;
+	return realm;
 }
 
 struct ceph_snap_realm *ceph_lookup_snap_realm(struct ceph_mds_client *mdsc,
@@ -174,7 +205,9 @@ static void __destroy_snap_realm(struct ceph_mds_client *mdsc,
 {
 	dout("__destroy_snap_realm %p %llx\n", realm, realm->ino);
 
+	write_seqlock(&snaprealm_lock);
 	rb_erase(&realm->node, &mdsc->snap_realms);
+	write_sequnlock(&snaprealm_lock);
 
 	if (realm->parent) {
 		list_del_init(&realm->child_item);
diff --git a/fs/ceph/super.h b/fs/ceph/super.h
index 2beeec07fa76..6474e8d875b7 100644
--- a/fs/ceph/super.h
+++ b/fs/ceph/super.h
@@ -760,6 +760,8 @@ static inline int default_congestion_kb(void)
 
 
 /* snap.c */
+extern seqlock_t snaprealm_lock;
+
 struct ceph_snap_realm *ceph_lookup_snap_realm(struct ceph_mds_client *mdsc,
 					       u64 ino);
 extern void ceph_get_snap_realm(struct ceph_mds_client *mdsc,

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [RFC v2 PATCH 2/4] ceph: quota: add initial infrastructure to support cephfs quotas
  2017-12-18 15:38 [RFC v2 PATCH 0/4] ceph: kernel client cephfs quota support Luis Henriques
  2017-12-18 15:38 ` [RFC v2 PATCH 1/4] ceph: add seqlock for snaprealm hierarchy change detection Luis Henriques
@ 2017-12-18 15:39 ` Luis Henriques
  2017-12-19  9:24   ` Yan, Zheng
  2017-12-18 15:39 ` [RFC v2 PATCH 3/4] ceph: quotas: support for ceph.quota.max_files Luis Henriques
  2017-12-18 15:39 ` [RFC v2 PATCH 4/4] ceph: quota: don't allow cross-quota renames Luis Henriques
  3 siblings, 1 reply; 9+ messages in thread
From: Luis Henriques @ 2017-12-18 15:39 UTC (permalink / raw)
  To: ceph-devel; +Cc: Yan, Zheng, Jeff Layton, Jan Fajerski, Luis Henriques

This patch adds the infrastructure required to support cephfs quotas as it
is currently implemented in the ceph fuse client.  Cephfs quotas can be
set on any directory, and can restrict the number of bytes or the number
of files stored beneath that point in the directory hierarchy.

Quotas are set using the extended attributes 'ceph.quota.max_files' and
'ceph.quota.max_bytes', and can be removed by setting these attributes to
'0'.

Link: http://tracker.ceph.com/issues/22372
Signed-off-by: Luis Henriques <lhenriques@suse.com>
---
 fs/ceph/Makefile                   |  2 +-
 fs/ceph/inode.c                    |  6 ++++
 fs/ceph/mds_client.c               | 23 +++++++++++++++
 fs/ceph/mds_client.h               |  2 ++
 fs/ceph/quota.c                    | 59 ++++++++++++++++++++++++++++++++++++++
 fs/ceph/super.h                    |  8 ++++++
 fs/ceph/xattr.c                    | 44 ++++++++++++++++++++++++++++
 include/linux/ceph/ceph_features.h |  3 +-
 include/linux/ceph/ceph_fs.h       | 17 +++++++++++
 9 files changed, 162 insertions(+), 2 deletions(-)
 create mode 100644 fs/ceph/quota.c

diff --git a/fs/ceph/Makefile b/fs/ceph/Makefile
index 174f5709e508..a699e320393f 100644
--- a/fs/ceph/Makefile
+++ b/fs/ceph/Makefile
@@ -6,7 +6,7 @@
 obj-$(CONFIG_CEPH_FS) += ceph.o
 
 ceph-y := super.o inode.o dir.o file.o locks.o addr.o ioctl.o \
-	export.o caps.o snap.o xattr.o \
+	export.o caps.o snap.o xattr.o quota.o \
 	mds_client.o mdsmap.o strings.o ceph_frag.o \
 	debugfs.o
 
diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
index ab81652198c4..8a0ba96e105d 100644
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -441,6 +441,9 @@ struct inode *ceph_alloc_inode(struct super_block *sb)
 	atomic64_set(&ci->i_complete_seq[1], 0);
 	ci->i_symlink = NULL;
 
+	ci->i_max_bytes = 0;
+	ci->i_max_files = 0;
+
 	memset(&ci->i_dir_layout, 0, sizeof(ci->i_dir_layout));
 	RCU_INIT_POINTER(ci->i_layout.pool_ns, NULL);
 
@@ -790,6 +793,9 @@ static int fill_inode(struct inode *inode, struct page *locked_page,
 	inode->i_rdev = le32_to_cpu(info->rdev);
 	inode->i_blkbits = fls(le32_to_cpu(info->layout.fl_stripe_unit)) - 1;
 
+	ci->i_max_bytes = iinfo->max_bytes;
+	ci->i_max_files = iinfo->max_files;
+
 	if ((new_version || (new_issued & CEPH_CAP_AUTH_SHARED)) &&
 	    (issued & CEPH_CAP_AUTH_EXCL) == 0) {
 		inode->i_mode = le32_to_cpu(info->mode);
diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index 1b468250e947..2290056d13fc 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -100,6 +100,26 @@ static int parse_reply_info_in(void **p, void *end,
 	} else
 		info->inline_version = CEPH_INLINE_NONE;
 
+	if (features & CEPH_FEATURE_MDS_QUOTA) {
+		u8 struct_v, struct_compat;
+		u32 struct_len;
+
+		/*
+		 * both struct_v and struct_compat are expected to be >= 1
+		 */
+		ceph_decode_8_safe(p, end, struct_v, bad);
+		ceph_decode_8_safe(p, end, struct_compat, bad);
+		if (!struct_v || !struct_compat)
+			goto bad;
+		ceph_decode_32_safe(p, end, struct_len, bad);
+		ceph_decode_need(p, end, struct_len, bad);
+		ceph_decode_64_safe(p, end, info->max_bytes, bad);
+		ceph_decode_64_safe(p, end, info->max_files, bad);
+	} else {
+		info->max_bytes = 0;
+		info->max_files = 0;
+	}
+
 	info->pool_ns_len = 0;
 	info->pool_ns_data = NULL;
 	if (features & CEPH_FEATURE_FS_FILE_LAYOUT_V2) {
@@ -4064,6 +4084,9 @@ static void dispatch(struct ceph_connection *con, struct ceph_msg *msg)
 	case CEPH_MSG_CLIENT_LEASE:
 		handle_lease(mdsc, s, msg);
 		break;
+	case CEPH_MSG_CLIENT_QUOTA:
+		ceph_handle_quota(mdsc, s, msg);
+		break;
 
 	default:
 		pr_err("received unknown message type %d %s\n", type,
diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h
index 837ac4b087a0..7af576733948 100644
--- a/fs/ceph/mds_client.h
+++ b/fs/ceph/mds_client.h
@@ -49,6 +49,8 @@ struct ceph_mds_reply_info_in {
 	char *inline_data;
 	u32 pool_ns_len;
 	char *pool_ns_data;
+	u64 max_bytes;
+	u64 max_files;
 };
 
 struct ceph_mds_reply_dir_entry {
diff --git a/fs/ceph/quota.c b/fs/ceph/quota.c
new file mode 100644
index 000000000000..69d74d7b73ad
--- /dev/null
+++ b/fs/ceph/quota.c
@@ -0,0 +1,59 @@
+/*
+ * quota.c - CephFS quota
+ *
+ * Copyright (C) 2017 SUSE
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "super.h"
+#include "mds_client.h"
+
+void ceph_handle_quota(struct ceph_mds_client *mdsc,
+		       struct ceph_mds_session *session,
+		       struct ceph_msg *msg)
+{
+	struct super_block *sb = mdsc->fsc->sb;
+	struct ceph_mds_quota *h = msg->front.iov_base;
+	struct ceph_vino vino;
+	struct inode *inode;
+	struct ceph_inode_info *ci;
+
+	if (msg->front.iov_len != sizeof(*h)) {
+		pr_err("ceph_handle_quota corrupt message mds%d len %d\n",
+		       session->s_mds, (int)msg->front.iov_len);
+		ceph_msg_dump(msg);
+		return;
+	}
+
+	/* lookup inode */
+	vino.ino = le64_to_cpu(h->ino);
+	vino.snap = CEPH_NOSNAP;
+	inode = ceph_find_inode(sb, vino);
+	ci = ceph_inode(inode);
+
+	mutex_lock(&session->s_mutex);
+	session->s_seq++;
+	mutex_unlock(&session->s_mutex);
+
+	spin_lock(&ci->i_ceph_lock);
+	ci->i_rbytes = le64_to_cpu(h->rbytes);
+	ci->i_rfiles = le64_to_cpu(h->rfiles);
+	ci->i_rsubdirs = le64_to_cpu(h->rsubdirs);
+	ci->i_max_bytes = le64_to_cpu(h->max_bytes);
+	ci->i_max_files = le64_to_cpu(h->max_files);
+	spin_unlock(&ci->i_ceph_lock);
+
+	iput(inode);
+}
diff --git a/fs/ceph/super.h b/fs/ceph/super.h
index 6474e8d875b7..e3e68448f55c 100644
--- a/fs/ceph/super.h
+++ b/fs/ceph/super.h
@@ -309,6 +309,9 @@ struct ceph_inode_info {
 	u64 i_rbytes, i_rfiles, i_rsubdirs;
 	u64 i_files, i_subdirs;
 
+	/* quotas */
+	u64 i_max_bytes, i_max_files;
+
 	struct rb_root i_fragtree;
 	int i_fragtree_nsplits;
 	struct mutex i_fragtree_mutex;
@@ -1021,4 +1024,9 @@ extern int ceph_locks_to_pagelist(struct ceph_filelock *flocks,
 extern int ceph_fs_debugfs_init(struct ceph_fs_client *client);
 extern void ceph_fs_debugfs_cleanup(struct ceph_fs_client *client);
 
+/* quota.c */
+extern void ceph_handle_quota(struct ceph_mds_client *mdsc,
+			      struct ceph_mds_session *session,
+			      struct ceph_msg *msg);
+
 #endif /* _FS_CEPH_SUPER_H */
diff --git a/fs/ceph/xattr.c b/fs/ceph/xattr.c
index e1c4e0b12b4c..cfc3028be0fa 100644
--- a/fs/ceph/xattr.c
+++ b/fs/ceph/xattr.c
@@ -224,6 +224,31 @@ static size_t ceph_vxattrcb_dir_rctime(struct ceph_inode_info *ci, char *val,
 			(long)ci->i_rctime.tv_nsec);
 }
 
+/* quotas */
+
+static bool ceph_vxattrcb_quota_exists(struct ceph_inode_info *ci)
+{
+	return (ci->i_max_files || ci->i_max_bytes);
+}
+
+static size_t ceph_vxattrcb_quota(struct ceph_inode_info *ci, char *val,
+				  size_t size)
+{
+	return snprintf(val, size, "max_bytes=%llu max_files=%llu",
+			ci->i_max_bytes, ci->i_max_files);
+}
+
+static size_t ceph_vxattrcb_quota_max_bytes(struct ceph_inode_info *ci,
+					    char *val, size_t size)
+{
+	return snprintf(val, size, "%llu", ci->i_max_bytes);
+}
+
+static size_t ceph_vxattrcb_quota_max_files(struct ceph_inode_info *ci,
+					    char *val, size_t size)
+{
+	return snprintf(val, size, "%llu", ci->i_max_files);
+}
 
 #define CEPH_XATTR_NAME(_type, _name)	XATTR_CEPH_PREFIX #_type "." #_name
 #define CEPH_XATTR_NAME2(_type, _name, _name2)	\
@@ -247,6 +272,15 @@ static size_t ceph_vxattrcb_dir_rctime(struct ceph_inode_info *ci, char *val,
 		.hidden = true,			\
 		.exists_cb = ceph_vxattrcb_layout_exists,	\
 	}
+#define XATTR_QUOTA_FIELD(_type, _name)					\
+	{								\
+		.name = CEPH_XATTR_NAME(_type, _name),			\
+		.name_size = sizeof (CEPH_XATTR_NAME(_type, _name)),	\
+		.getxattr_cb = ceph_vxattrcb_ ## _type ## _ ## _name,	\
+		.readonly = false,					\
+		.hidden = true,						\
+		.exists_cb = ceph_vxattrcb_quota_exists,		\
+	}
 
 static struct ceph_vxattr ceph_dir_vxattrs[] = {
 	{
@@ -270,6 +304,16 @@ static struct ceph_vxattr ceph_dir_vxattrs[] = {
 	XATTR_NAME_CEPH(dir, rsubdirs),
 	XATTR_NAME_CEPH(dir, rbytes),
 	XATTR_NAME_CEPH(dir, rctime),
+	{
+		.name = "ceph.quota",
+		.name_size = sizeof("ceph.quota"),
+		.getxattr_cb = ceph_vxattrcb_quota,
+		.readonly = false,
+		.hidden = true,
+		.exists_cb = ceph_vxattrcb_quota_exists,
+	},
+	XATTR_QUOTA_FIELD(quota, max_bytes),
+	XATTR_QUOTA_FIELD(quota, max_files),
 	{ .name = NULL, 0 }	/* Required table terminator */
 };
 static size_t ceph_dir_vxattrs_name_size;	/* total size of all names */
diff --git a/include/linux/ceph/ceph_features.h b/include/linux/ceph/ceph_features.h
index 59042d5ac520..6acd46c36271 100644
--- a/include/linux/ceph/ceph_features.h
+++ b/include/linux/ceph/ceph_features.h
@@ -209,7 +209,8 @@ DEFINE_CEPH_FEATURE_DEPRECATED(63, 1, RESERVED_BROKEN, LUMINOUS) // client-facin
 	 CEPH_FEATURE_SERVER_JEWEL |		\
 	 CEPH_FEATURE_MON_STATEFUL_SUB |	\
 	 CEPH_FEATURE_CRUSH_TUNABLES5 |		\
-	 CEPH_FEATURE_NEW_OSDOPREPLY_ENCODING)
+	 CEPH_FEATURE_NEW_OSDOPREPLY_ENCODING |	\
+	 CEPH_FEATURE_MDS_QUOTA)
 
 #define CEPH_FEATURES_REQUIRED_DEFAULT   \
 	(CEPH_FEATURE_NOSRCADDR |	 \
diff --git a/include/linux/ceph/ceph_fs.h b/include/linux/ceph/ceph_fs.h
index 88dd51381aaf..98bdcc0eda3f 100644
--- a/include/linux/ceph/ceph_fs.h
+++ b/include/linux/ceph/ceph_fs.h
@@ -134,6 +134,7 @@ struct ceph_dir_layout {
 #define CEPH_MSG_CLIENT_LEASE           0x311
 #define CEPH_MSG_CLIENT_SNAP            0x312
 #define CEPH_MSG_CLIENT_CAPRELEASE      0x313
+#define CEPH_MSG_CLIENT_QUOTA		0x314
 
 /* pool ops */
 #define CEPH_MSG_POOLOP_REPLY           48
@@ -807,4 +808,20 @@ struct ceph_mds_snap_realm {
 } __attribute__ ((packed));
 /* followed by my snap list, then prior parent snap list */
 
+/*
+ * quotas
+ */
+struct ceph_mds_quota {
+	__le64 ino;		/* ino */
+	struct ceph_timespec rctime;
+	__le64 rbytes;		/* dir stats */
+	__le64 rfiles;
+	__le64 rsubdirs;
+	__u8 struct_v;		/* compat */
+	__u8 struct_compat;
+	__le32 struct_len;
+	__le64 max_bytes;	/* quota max. bytes */
+	__le64 max_files;	/* quota max. files */
+} __attribute__ ((packed));
+
 #endif

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [RFC v2 PATCH 3/4] ceph: quotas: support for ceph.quota.max_files
  2017-12-18 15:38 [RFC v2 PATCH 0/4] ceph: kernel client cephfs quota support Luis Henriques
  2017-12-18 15:38 ` [RFC v2 PATCH 1/4] ceph: add seqlock for snaprealm hierarchy change detection Luis Henriques
  2017-12-18 15:39 ` [RFC v2 PATCH 2/4] ceph: quota: add initial infrastructure to support cephfs quotas Luis Henriques
@ 2017-12-18 15:39 ` Luis Henriques
  2017-12-18 15:39 ` [RFC v2 PATCH 4/4] ceph: quota: don't allow cross-quota renames Luis Henriques
  3 siblings, 0 replies; 9+ messages in thread
From: Luis Henriques @ 2017-12-18 15:39 UTC (permalink / raw)
  To: ceph-devel; +Cc: Yan, Zheng, Jeff Layton, Jan Fajerski, Luis Henriques

This patch adds support for the max_files quota.  It hooks into all the
ceph functions that add new filesystem objects that need to be checked
against the quota limits.  When these limits are hit, -EDQUOT is returned.

Note that we're not checking quotas on ceph_link().  ceph_link doesn't
really create a new inode,  and since the MDS doesn't update the directory
statistics when a new (hard) link is created (only with symlinks), they
are not accounted as a new file.

Link: http://tracker.ceph.com/issues/22372
Signed-off-by: Luis Henriques <lhenriques@suse.com>
---
 fs/ceph/dir.c   | 11 +++++++++
 fs/ceph/file.c  |  4 ++-
 fs/ceph/quota.c | 75 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/ceph/super.h |  1 +
 4 files changed, 90 insertions(+), 1 deletion(-)

diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c
index 8a5266699b67..66550d92b1ac 100644
--- a/fs/ceph/dir.c
+++ b/fs/ceph/dir.c
@@ -818,6 +818,9 @@ static int ceph_mknod(struct inode *dir, struct dentry *dentry,
 	if (ceph_snap(dir) != CEPH_NOSNAP)
 		return -EROFS;
 
+	if (ceph_quota_is_max_files_exceeded(dir))
+		return -EDQUOT;
+
 	err = ceph_pre_init_acls(dir, &mode, &acls);
 	if (err < 0)
 		return err;
@@ -871,6 +874,9 @@ static int ceph_symlink(struct inode *dir, struct dentry *dentry,
 	if (ceph_snap(dir) != CEPH_NOSNAP)
 		return -EROFS;
 
+	if (ceph_quota_is_max_files_exceeded(dir))
+		return -EDQUOT;
+
 	dout("symlink in dir %p dentry %p to '%s'\n", dir, dentry, dest);
 	req = ceph_mdsc_create_request(mdsc, CEPH_MDS_OP_SYMLINK, USE_AUTH_MDS);
 	if (IS_ERR(req)) {
@@ -920,6 +926,11 @@ static int ceph_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode)
 		goto out;
 	}
 
+	if (ceph_quota_is_max_files_exceeded(dir)) {
+		err = -EDQUOT;
+		goto out;
+	}
+
 	mode |= S_IFDIR;
 	err = ceph_pre_init_acls(dir, &mode, &acls);
 	if (err < 0)
diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index 5c17125f45c7..5a77a66e3d6b 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -371,7 +371,7 @@ int ceph_atomic_open(struct inode *dir, struct dentry *dentry,
 	struct ceph_mds_request *req;
 	struct dentry *dn;
 	struct ceph_acls_info acls = {};
-       int mask;
+	int mask;
 	int err;
 
 	dout("atomic_open %p dentry %p '%pd' %s flags %d mode 0%o\n",
@@ -382,6 +382,8 @@ int ceph_atomic_open(struct inode *dir, struct dentry *dentry,
 		return -ENAMETOOLONG;
 
 	if (flags & O_CREAT) {
+		if (ceph_quota_is_max_files_exceeded(dir))
+			return -EDQUOT;
 		err = ceph_pre_init_acls(dir, &mode, &acls);
 		if (err < 0)
 			return err;
diff --git a/fs/ceph/quota.c b/fs/ceph/quota.c
index 69d74d7b73ad..06f28f11be25 100644
--- a/fs/ceph/quota.c
+++ b/fs/ceph/quota.c
@@ -57,3 +57,78 @@ void ceph_handle_quota(struct ceph_mds_client *mdsc,
 
 	iput(inode);
 }
+
+/*
+ * check_quota_exceeded() will walk up the snaprealm hierarchy and, for each
+ * realm, it will execute quota check operation defined by the 'op' parameter.
+ * The snaprealm walk is interrupted if the quota check detects that the quota
+ * is exceeded or if the root inode is reached.
+ * The whole operation is restarted if a snaprealm change is detected through
+ * the snaprealm_lock seqlock.
+ */
+enum quota_check_op {
+	QUOTA_CHECK_MAX_FILES_OP /* check quota max_files limit */
+};
+
+static bool check_quota_exceeded(struct inode *inode, enum quota_check_op op,
+				 loff_t size)
+{
+	struct ceph_mds_client *mdsc = ceph_inode_to_client(inode)->mdsc;
+	struct ceph_inode_info *ci;
+	struct ceph_snap_realm *realm, *next;
+	struct ceph_vino vino;
+	struct inode *ino;
+	u64 max = 0, rvalue = 0;
+	bool quota_exceeded, is_root;
+	unsigned seq;
+
+	WARN_ON(!S_ISDIR(inode->i_mode));
+retry:
+	quota_exceeded = false;
+	seq = read_seqbegin(&snaprealm_lock);
+	realm = ceph_inode(inode)->i_snap_realm;
+	ceph_get_snap_realm(mdsc, realm);
+	while (realm) {
+		vino.ino = realm->ino;
+		vino.snap = CEPH_NOSNAP;
+		ino = ceph_find_inode(inode->i_sb, vino);
+		if (!ino) {
+			pr_warn("Failed to find inode for %llu\n", vino.ino);
+			break;
+		}
+		ci = ceph_inode(ino);
+		switch(op) {
+		case QUOTA_CHECK_MAX_FILES_OP:
+			spin_lock(&ci->i_ceph_lock);
+			max = ci->i_max_files;
+			rvalue = ci->i_rfiles + ci->i_rsubdirs;
+			is_root = (ci->i_vino.ino == CEPH_INO_ROOT);
+			spin_unlock(&ci->i_ceph_lock);
+			quota_exceeded = (max && (rvalue >= max));
+			break;
+		default:
+			/* Shouldn't happen */
+			pr_warn("Invalid quota check op (%d)\n", op);
+			is_root = true; /* Just break the look */
+		}
+		iput(ino);
+
+		if (quota_exceeded || is_root)
+			break;
+		next = realm->parent;
+		ceph_get_snap_realm(mdsc, next);
+		ceph_put_snap_realm(mdsc, realm);
+		realm = next;
+	}
+	ceph_put_snap_realm(mdsc, realm);
+
+	if (read_seqretry(&snaprealm_lock, seq))
+		goto retry;
+
+	return quota_exceeded;
+}
+
+bool ceph_quota_is_max_files_exceeded(struct inode *inode)
+{
+	return check_quota_exceeded(inode, QUOTA_CHECK_MAX_FILES_OP, 0);
+}
diff --git a/fs/ceph/super.h b/fs/ceph/super.h
index e3e68448f55c..a83847d6f8f9 100644
--- a/fs/ceph/super.h
+++ b/fs/ceph/super.h
@@ -1028,5 +1028,6 @@ extern void ceph_fs_debugfs_cleanup(struct ceph_fs_client *client);
 extern void ceph_handle_quota(struct ceph_mds_client *mdsc,
 			      struct ceph_mds_session *session,
 			      struct ceph_msg *msg);
+extern bool ceph_quota_is_max_files_exceeded(struct inode *inode);
 
 #endif /* _FS_CEPH_SUPER_H */

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [RFC v2 PATCH 4/4] ceph: quota: don't allow cross-quota renames
  2017-12-18 15:38 [RFC v2 PATCH 0/4] ceph: kernel client cephfs quota support Luis Henriques
                   ` (2 preceding siblings ...)
  2017-12-18 15:39 ` [RFC v2 PATCH 3/4] ceph: quotas: support for ceph.quota.max_files Luis Henriques
@ 2017-12-18 15:39 ` Luis Henriques
  3 siblings, 0 replies; 9+ messages in thread
From: Luis Henriques @ 2017-12-18 15:39 UTC (permalink / raw)
  To: ceph-devel; +Cc: Yan, Zheng, Jeff Layton, Jan Fajerski, Luis Henriques

This patch changes ceph_rename so that -EXDEV is returned if an attempt is
made to mv a file between two different dir trees with different quotas
setup.

Link: http://tracker.ceph.com/issues/22372
Signed-off-by: Luis Henriques <lhenriques@suse.com>
---
 fs/ceph/dir.c   |  5 +++++
 fs/ceph/quota.c | 63 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/ceph/super.h |  1 +
 3 files changed, 69 insertions(+)

diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c
index 66550d92b1ac..f6ac16caa1e9 100644
--- a/fs/ceph/dir.c
+++ b/fs/ceph/dir.c
@@ -1090,6 +1090,11 @@ static int ceph_rename(struct inode *old_dir, struct dentry *old_dentry,
 		else
 			return -EROFS;
 	}
+	/* don't allow cross-quota renames */
+	if ((old_dir != new_dir) &&
+	    (!ceph_quota_is_same_realm(old_dir, new_dir)))
+		return -EXDEV;
+
 	dout("rename dir %p dentry %p to dir %p dentry %p\n",
 	     old_dir, old_dentry, new_dir, new_dentry);
 	req = ceph_mdsc_create_request(mdsc, op, USE_AUTH_MDS);
diff --git a/fs/ceph/quota.c b/fs/ceph/quota.c
index 06f28f11be25..119e16ce793b 100644
--- a/fs/ceph/quota.c
+++ b/fs/ceph/quota.c
@@ -20,6 +20,11 @@
 #include "super.h"
 #include "mds_client.h"
 
+static inline bool ceph_has_quota(struct ceph_inode_info *ci)
+{
+	return (ci && (ci->i_max_files || ci->i_max_bytes));
+}
+
 void ceph_handle_quota(struct ceph_mds_client *mdsc,
 		       struct ceph_mds_session *session,
 		       struct ceph_msg *msg)
@@ -58,6 +63,64 @@ void ceph_handle_quota(struct ceph_mds_client *mdsc,
 	iput(inode);
 }
 
+/*
+ * This function walks through the snaprealm for an inode and returns the
+ * ceph_inode_info for the first snaprealm that has quotas set (either max_files
+ * or max_bytes).  If the root is reached, return the root ceph_inode_info
+ * instead.
+ *
+ * Note that this snaprealm walk isn't protected with snaprealm_look, that shall
+ * be done by the caller.
+ */
+static struct ceph_inode_info *get_quota_realm(struct inode *inode)
+{
+	struct ceph_mds_client *mdsc = ceph_inode_to_client(inode)->mdsc;
+	struct ceph_inode_info *ci = NULL;
+	struct ceph_snap_realm *realm, *next;
+	struct ceph_vino vino;
+	struct inode *ino;
+
+	realm = ceph_inode(inode)->i_snap_realm;
+	ceph_get_snap_realm(mdsc, realm);
+	while (realm) {
+		vino.ino = realm->ino;
+		vino.snap = CEPH_NOSNAP;
+		ino = ceph_find_inode(inode->i_sb, vino);
+		if (!ino) {
+			pr_warn("Failed to find inode for %llu\n", vino.ino);
+			break;
+		}
+		ci = ceph_inode(ino);
+		if (ceph_has_quota(ci) || (ci->i_vino.ino == CEPH_INO_ROOT)) {
+			iput(ino);
+			break;
+		}
+		iput(ino);
+		next = realm->parent;
+		ceph_get_snap_realm(mdsc, next);
+		ceph_put_snap_realm(mdsc, realm);
+		realm = next;
+	}
+	ceph_put_snap_realm(mdsc, realm);
+
+	return ci;
+}
+
+bool ceph_quota_is_same_realm(struct inode *old, struct inode *new)
+{
+	struct ceph_inode_info *ci_old, *ci_new;
+	unsigned seq;
+
+retry:
+	seq = read_seqbegin(&snaprealm_lock);
+	ci_old = get_quota_realm(old);
+	ci_new = get_quota_realm(new);
+	if (read_seqretry(&snaprealm_lock, seq))
+		goto retry;
+
+	return (ci_old == ci_new);
+}
+
 /*
  * check_quota_exceeded() will walk up the snaprealm hierarchy and, for each
  * realm, it will execute quota check operation defined by the 'op' parameter.
diff --git a/fs/ceph/super.h b/fs/ceph/super.h
index a83847d6f8f9..d8c8baaf049c 100644
--- a/fs/ceph/super.h
+++ b/fs/ceph/super.h
@@ -1029,5 +1029,6 @@ extern void ceph_handle_quota(struct ceph_mds_client *mdsc,
 			      struct ceph_mds_session *session,
 			      struct ceph_msg *msg);
 extern bool ceph_quota_is_max_files_exceeded(struct inode *inode);
+extern bool ceph_quota_is_same_realm(struct inode *old, struct inode *new);
 
 #endif /* _FS_CEPH_SUPER_H */

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [RFC v2 PATCH 1/4] ceph: add seqlock for snaprealm hierarchy change detection
  2017-12-18 15:38 ` [RFC v2 PATCH 1/4] ceph: add seqlock for snaprealm hierarchy change detection Luis Henriques
@ 2017-12-19  9:22   ` Yan, Zheng
  2017-12-19 10:57     ` Luis Henriques
  0 siblings, 1 reply; 9+ messages in thread
From: Yan, Zheng @ 2017-12-19  9:22 UTC (permalink / raw)
  To: Luis Henriques; +Cc: ceph-devel, Yan, Zheng, Jeff Layton, Jan Fajerski

On Mon, Dec 18, 2017 at 11:38 PM, Luis Henriques <lhenriques@suse.com> wrote:
> It is possible to receive an update to the snaprealms hierarchy from an
> MDS while walking through this hierarchy.  This patch adds a mechanism
> similar to the one used in dcache to detect renames in lookups.  A new
> seqlock is used to allow a retry in case a change has occurred while
> walking through the snaprealms.
>
> Link: http://tracker.ceph.com/issues/22372
> Signed-off-by: Luis Henriques <lhenriques@suse.com>
> ---
>  fs/ceph/snap.c  | 45 +++++++++++++++++++++++++++++++++++++++------
>  fs/ceph/super.h |  2 ++
>  2 files changed, 41 insertions(+), 6 deletions(-)
>
> diff --git a/fs/ceph/snap.c b/fs/ceph/snap.c
> index 8a2ca41e4b97..8b9d6c7c0df4 100644
> --- a/fs/ceph/snap.c
> +++ b/fs/ceph/snap.c
> @@ -54,6 +54,25 @@
>   * console).
>   */
>
> +/*
> + * While walking through the snaprealm hierarchy it is possible that
> + * this hierarchy is updated (for ex, when a different client moves
> + * directories around).  snaprealm_lock isn't supposed to prevent this
> + * but, just like the rename_lock in dcache, to detect that this has
> + * happen so that a lookup can be retried.
> + *
> + * Here's a typical usage pattern for this lock:
> + *
> + * retry:
> + *     seq = read_seqbegin(&snaprealm_lock);
> + *     realm = ci->i_snap_realm;
> + *     ceph_get_snap_realm(mdsc, realm);
> + *     ... do stuff ...
> + *     ceph_put_snap_realm(mdsc, realm);
> + *     if (read_seqretry(&snaprealm_lock, seq))
> + *             goto retry;
> + */

A seq lock is not enough for protecting snaprealm hierarchy walk.  The
code may access snaprealm that has been freed by other thread. If we
really want to use seq lock here, we need to use kfree_rcu to free
snaprealm data structure and use rcu_read_lock to protect the
hierarchy walk code.

> +DEFINE_SEQLOCK(snaprealm_lock);
>
>  /*
>   * increase ref count for the realm
> @@ -81,10 +100,12 @@ void ceph_get_snap_realm(struct ceph_mds_client *mdsc,
>  static void __insert_snap_realm(struct rb_root *root,
>                                 struct ceph_snap_realm *new)
>  {
> -       struct rb_node **p = &root->rb_node;
> +       struct rb_node **p;
>         struct rb_node *parent = NULL;
>         struct ceph_snap_realm *r = NULL;
>
> +       write_seqlock(&snaprealm_lock);
> +       p  = &root->rb_node;
>         while (*p) {
>                 parent = *p;
>                 r = rb_entry(parent, struct ceph_snap_realm, node);
> @@ -98,6 +119,7 @@ static void __insert_snap_realm(struct rb_root *root,
>
>         rb_link_node(&new->node, parent, p);
>         rb_insert_color(&new->node, root);
> +       write_sequnlock(&snaprealm_lock);
>  }

Adding/removing snaprealm to/from mdsc->snap_realms do not directly
change snaprealm hierarchy.  The places that change snaprealm
hierarchy should be adjust_snap_realm_parent() and the code block in
ceph_handle_snap() that handle CEPH_SNAP_OP_SPLIT.

The code block in ceph_handle_snap() that handle CEPH_SNAP_OP_SPLIT
may require lots of cpu cycles, not suitable for seq lock.

>
>  /*
> @@ -136,9 +158,14 @@ static struct ceph_snap_realm *ceph_create_snap_realm(
>  static struct ceph_snap_realm *__lookup_snap_realm(struct ceph_mds_client *mdsc,
>                                                    u64 ino)
>  {
> -       struct rb_node *n = mdsc->snap_realms.rb_node;
> -       struct ceph_snap_realm *r;
> -
> +       struct rb_node *n;
> +       struct ceph_snap_realm *realm, *r;
> +       unsigned seq;
> +
> +retry:
> +       realm = NULL;
> +       seq = read_seqbegin(&snaprealm_lock);
> +       n = mdsc->snap_realms.rb_node;
>         while (n) {
>                 r = rb_entry(n, struct ceph_snap_realm, node);
>                 if (ino < r->ino)
> @@ -147,10 +174,14 @@ static struct ceph_snap_realm *__lookup_snap_realm(struct ceph_mds_client *mdsc,
>                         n = n->rb_right;
>                 else {
>                         dout("lookup_snap_realm %llx %p\n", r->ino, r);
> -                       return r;
> +                       realm = r;
> +                       break;
>                 }
>         }
> -       return NULL;
> +
> +       if (read_seqretry(&snaprealm_lock, seq))
> +               goto retry;
> +       return realm;
>  }

caller of __lookup_snap_realm() should hold mdsc->snap_rwsem, no need
to use seq lock.

>
>  struct ceph_snap_realm *ceph_lookup_snap_realm(struct ceph_mds_client *mdsc,
> @@ -174,7 +205,9 @@ static void __destroy_snap_realm(struct ceph_mds_client *mdsc,
>  {
>         dout("__destroy_snap_realm %p %llx\n", realm, realm->ino);
>
> +       write_seqlock(&snaprealm_lock);
>         rb_erase(&realm->node, &mdsc->snap_realms);
> +       write_sequnlock(&snaprealm_lock);
>
>         if (realm->parent) {
>                 list_del_init(&realm->child_item);
> diff --git a/fs/ceph/super.h b/fs/ceph/super.h
> index 2beeec07fa76..6474e8d875b7 100644
> --- a/fs/ceph/super.h
> +++ b/fs/ceph/super.h
> @@ -760,6 +760,8 @@ static inline int default_congestion_kb(void)
>
>
>  /* snap.c */
> +extern seqlock_t snaprealm_lock;
> +
>  struct ceph_snap_realm *ceph_lookup_snap_realm(struct ceph_mds_client *mdsc,
>                                                u64 ino);
>  extern void ceph_get_snap_realm(struct ceph_mds_client *mdsc,
> --

For the above reason, I think we'd better not to introduce the new seq
lock. Just read lock mdsc->snap_rwsem when walking the snaprealm
hierarchy.

Regards
Yan, Zheng

> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC v2 PATCH 2/4] ceph: quota: add initial infrastructure to support cephfs quotas
  2017-12-18 15:39 ` [RFC v2 PATCH 2/4] ceph: quota: add initial infrastructure to support cephfs quotas Luis Henriques
@ 2017-12-19  9:24   ` Yan, Zheng
  2017-12-19 10:59     ` Luis Henriques
  0 siblings, 1 reply; 9+ messages in thread
From: Yan, Zheng @ 2017-12-19  9:24 UTC (permalink / raw)
  To: Luis Henriques; +Cc: ceph-devel, Yan, Zheng, Jeff Layton, Jan Fajerski

On Mon, Dec 18, 2017 at 11:39 PM, Luis Henriques <lhenriques@suse.com> wrote:
> This patch adds the infrastructure required to support cephfs quotas as it
> is currently implemented in the ceph fuse client.  Cephfs quotas can be
> set on any directory, and can restrict the number of bytes or the number
> of files stored beneath that point in the directory hierarchy.
>
> Quotas are set using the extended attributes 'ceph.quota.max_files' and
> 'ceph.quota.max_bytes', and can be removed by setting these attributes to
> '0'.
>
> Link: http://tracker.ceph.com/issues/22372
> Signed-off-by: Luis Henriques <lhenriques@suse.com>
> ---
>  fs/ceph/Makefile                   |  2 +-
>  fs/ceph/inode.c                    |  6 ++++
>  fs/ceph/mds_client.c               | 23 +++++++++++++++
>  fs/ceph/mds_client.h               |  2 ++
>  fs/ceph/quota.c                    | 59 ++++++++++++++++++++++++++++++++++++++
>  fs/ceph/super.h                    |  8 ++++++
>  fs/ceph/xattr.c                    | 44 ++++++++++++++++++++++++++++
>  include/linux/ceph/ceph_features.h |  3 +-
>  include/linux/ceph/ceph_fs.h       | 17 +++++++++++
>  9 files changed, 162 insertions(+), 2 deletions(-)
>  create mode 100644 fs/ceph/quota.c
>
> diff --git a/fs/ceph/Makefile b/fs/ceph/Makefile
> index 174f5709e508..a699e320393f 100644
> --- a/fs/ceph/Makefile
> +++ b/fs/ceph/Makefile
> @@ -6,7 +6,7 @@
>  obj-$(CONFIG_CEPH_FS) += ceph.o
>
>  ceph-y := super.o inode.o dir.o file.o locks.o addr.o ioctl.o \
> -       export.o caps.o snap.o xattr.o \
> +       export.o caps.o snap.o xattr.o quota.o \
>         mds_client.o mdsmap.o strings.o ceph_frag.o \
>         debugfs.o
>
> diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
> index ab81652198c4..8a0ba96e105d 100644
> --- a/fs/ceph/inode.c
> +++ b/fs/ceph/inode.c
> @@ -441,6 +441,9 @@ struct inode *ceph_alloc_inode(struct super_block *sb)
>         atomic64_set(&ci->i_complete_seq[1], 0);
>         ci->i_symlink = NULL;
>
> +       ci->i_max_bytes = 0;
> +       ci->i_max_files = 0;
> +
>         memset(&ci->i_dir_layout, 0, sizeof(ci->i_dir_layout));
>         RCU_INIT_POINTER(ci->i_layout.pool_ns, NULL);
>
> @@ -790,6 +793,9 @@ static int fill_inode(struct inode *inode, struct page *locked_page,
>         inode->i_rdev = le32_to_cpu(info->rdev);
>         inode->i_blkbits = fls(le32_to_cpu(info->layout.fl_stripe_unit)) - 1;
>
> +       ci->i_max_bytes = iinfo->max_bytes;
> +       ci->i_max_files = iinfo->max_files;
> +
>         if ((new_version || (new_issued & CEPH_CAP_AUTH_SHARED)) &&
>             (issued & CEPH_CAP_AUTH_EXCL) == 0) {
>                 inode->i_mode = le32_to_cpu(info->mode);
> diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
> index 1b468250e947..2290056d13fc 100644
> --- a/fs/ceph/mds_client.c
> +++ b/fs/ceph/mds_client.c
> @@ -100,6 +100,26 @@ static int parse_reply_info_in(void **p, void *end,
>         } else
>                 info->inline_version = CEPH_INLINE_NONE;
>
> +       if (features & CEPH_FEATURE_MDS_QUOTA) {
> +               u8 struct_v, struct_compat;
> +               u32 struct_len;
> +
> +               /*
> +                * both struct_v and struct_compat are expected to be >= 1
> +                */
> +               ceph_decode_8_safe(p, end, struct_v, bad);
> +               ceph_decode_8_safe(p, end, struct_compat, bad);
> +               if (!struct_v || !struct_compat)
> +                       goto bad;
> +               ceph_decode_32_safe(p, end, struct_len, bad);
> +               ceph_decode_need(p, end, struct_len, bad);
> +               ceph_decode_64_safe(p, end, info->max_bytes, bad);
> +               ceph_decode_64_safe(p, end, info->max_files, bad);
> +       } else {
> +               info->max_bytes = 0;
> +               info->max_files = 0;
> +       }
> +
>         info->pool_ns_len = 0;
>         info->pool_ns_data = NULL;
>         if (features & CEPH_FEATURE_FS_FILE_LAYOUT_V2) {
> @@ -4064,6 +4084,9 @@ static void dispatch(struct ceph_connection *con, struct ceph_msg *msg)
>         case CEPH_MSG_CLIENT_LEASE:
>                 handle_lease(mdsc, s, msg);
>                 break;
> +       case CEPH_MSG_CLIENT_QUOTA:
> +               ceph_handle_quota(mdsc, s, msg);
> +               break;
>
>         default:
>                 pr_err("received unknown message type %d %s\n", type,
> diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h
> index 837ac4b087a0..7af576733948 100644
> --- a/fs/ceph/mds_client.h
> +++ b/fs/ceph/mds_client.h
> @@ -49,6 +49,8 @@ struct ceph_mds_reply_info_in {
>         char *inline_data;
>         u32 pool_ns_len;
>         char *pool_ns_data;
> +       u64 max_bytes;
> +       u64 max_files;
>  };
>
>  struct ceph_mds_reply_dir_entry {
> diff --git a/fs/ceph/quota.c b/fs/ceph/quota.c
> new file mode 100644
> index 000000000000..69d74d7b73ad
> --- /dev/null
> +++ b/fs/ceph/quota.c
> @@ -0,0 +1,59 @@
> +/*
> + * quota.c - CephFS quota
> + *
> + * Copyright (C) 2017 SUSE
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version 2
> + * of the License, or (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#include "super.h"
> +#include "mds_client.h"
> +
> +void ceph_handle_quota(struct ceph_mds_client *mdsc,
> +                      struct ceph_mds_session *session,
> +                      struct ceph_msg *msg)
> +{
> +       struct super_block *sb = mdsc->fsc->sb;
> +       struct ceph_mds_quota *h = msg->front.iov_base;
> +       struct ceph_vino vino;
> +       struct inode *inode;
> +       struct ceph_inode_info *ci;
> +
> +       if (msg->front.iov_len != sizeof(*h)) {
> +               pr_err("ceph_handle_quota corrupt message mds%d len %d\n",
> +                      session->s_mds, (int)msg->front.iov_len);
> +               ceph_msg_dump(msg);
> +               return;
> +       }
> +
> +       /* lookup inode */
> +       vino.ino = le64_to_cpu(h->ino);
> +       vino.snap = CEPH_NOSNAP;
> +       inode = ceph_find_inode(sb, vino);

check null

> +       ci = ceph_inode(inode);
> +
> +       mutex_lock(&session->s_mutex);
> +       session->s_seq++;
> +       mutex_unlock(&session->s_mutex);
> +
> +       spin_lock(&ci->i_ceph_lock);
> +       ci->i_rbytes = le64_to_cpu(h->rbytes);
> +       ci->i_rfiles = le64_to_cpu(h->rfiles);
> +       ci->i_rsubdirs = le64_to_cpu(h->rsubdirs);
> +       ci->i_max_bytes = le64_to_cpu(h->max_bytes);
> +       ci->i_max_files = le64_to_cpu(h->max_files);
> +       spin_unlock(&ci->i_ceph_lock);
> +
> +       iput(inode);
> +}
> diff --git a/fs/ceph/super.h b/fs/ceph/super.h
> index 6474e8d875b7..e3e68448f55c 100644
> --- a/fs/ceph/super.h
> +++ b/fs/ceph/super.h
> @@ -309,6 +309,9 @@ struct ceph_inode_info {
>         u64 i_rbytes, i_rfiles, i_rsubdirs;
>         u64 i_files, i_subdirs;
>
> +       /* quotas */
> +       u64 i_max_bytes, i_max_files;
> +
>         struct rb_root i_fragtree;
>         int i_fragtree_nsplits;
>         struct mutex i_fragtree_mutex;
> @@ -1021,4 +1024,9 @@ extern int ceph_locks_to_pagelist(struct ceph_filelock *flocks,
>  extern int ceph_fs_debugfs_init(struct ceph_fs_client *client);
>  extern void ceph_fs_debugfs_cleanup(struct ceph_fs_client *client);
>
> +/* quota.c */
> +extern void ceph_handle_quota(struct ceph_mds_client *mdsc,
> +                             struct ceph_mds_session *session,
> +                             struct ceph_msg *msg);
> +
>  #endif /* _FS_CEPH_SUPER_H */
> diff --git a/fs/ceph/xattr.c b/fs/ceph/xattr.c
> index e1c4e0b12b4c..cfc3028be0fa 100644
> --- a/fs/ceph/xattr.c
> +++ b/fs/ceph/xattr.c
> @@ -224,6 +224,31 @@ static size_t ceph_vxattrcb_dir_rctime(struct ceph_inode_info *ci, char *val,
>                         (long)ci->i_rctime.tv_nsec);
>  }
>
> +/* quotas */
> +
> +static bool ceph_vxattrcb_quota_exists(struct ceph_inode_info *ci)
> +{
> +       return (ci->i_max_files || ci->i_max_bytes);
> +}
> +
> +static size_t ceph_vxattrcb_quota(struct ceph_inode_info *ci, char *val,
> +                                 size_t size)
> +{
> +       return snprintf(val, size, "max_bytes=%llu max_files=%llu",
> +                       ci->i_max_bytes, ci->i_max_files);
> +}
> +
> +static size_t ceph_vxattrcb_quota_max_bytes(struct ceph_inode_info *ci,
> +                                           char *val, size_t size)
> +{
> +       return snprintf(val, size, "%llu", ci->i_max_bytes);
> +}
> +
> +static size_t ceph_vxattrcb_quota_max_files(struct ceph_inode_info *ci,
> +                                           char *val, size_t size)
> +{
> +       return snprintf(val, size, "%llu", ci->i_max_files);
> +}
>
>  #define CEPH_XATTR_NAME(_type, _name)  XATTR_CEPH_PREFIX #_type "." #_name
>  #define CEPH_XATTR_NAME2(_type, _name, _name2) \
> @@ -247,6 +272,15 @@ static size_t ceph_vxattrcb_dir_rctime(struct ceph_inode_info *ci, char *val,
>                 .hidden = true,                 \
>                 .exists_cb = ceph_vxattrcb_layout_exists,       \
>         }
> +#define XATTR_QUOTA_FIELD(_type, _name)                                        \
> +       {                                                               \
> +               .name = CEPH_XATTR_NAME(_type, _name),                  \
> +               .name_size = sizeof (CEPH_XATTR_NAME(_type, _name)),    \
> +               .getxattr_cb = ceph_vxattrcb_ ## _type ## _ ## _name,   \
> +               .readonly = false,                                      \
> +               .hidden = true,                                         \
> +               .exists_cb = ceph_vxattrcb_quota_exists,                \
> +       }
>
>  static struct ceph_vxattr ceph_dir_vxattrs[] = {
>         {
> @@ -270,6 +304,16 @@ static struct ceph_vxattr ceph_dir_vxattrs[] = {
>         XATTR_NAME_CEPH(dir, rsubdirs),
>         XATTR_NAME_CEPH(dir, rbytes),
>         XATTR_NAME_CEPH(dir, rctime),
> +       {
> +               .name = "ceph.quota",
> +               .name_size = sizeof("ceph.quota"),
> +               .getxattr_cb = ceph_vxattrcb_quota,
> +               .readonly = false,
> +               .hidden = true,
> +               .exists_cb = ceph_vxattrcb_quota_exists,
> +       },
> +       XATTR_QUOTA_FIELD(quota, max_bytes),
> +       XATTR_QUOTA_FIELD(quota, max_files),
>         { .name = NULL, 0 }     /* Required table terminator */
>  };
>  static size_t ceph_dir_vxattrs_name_size;      /* total size of all names */
> diff --git a/include/linux/ceph/ceph_features.h b/include/linux/ceph/ceph_features.h
> index 59042d5ac520..6acd46c36271 100644
> --- a/include/linux/ceph/ceph_features.h
> +++ b/include/linux/ceph/ceph_features.h
> @@ -209,7 +209,8 @@ DEFINE_CEPH_FEATURE_DEPRECATED(63, 1, RESERVED_BROKEN, LUMINOUS) // client-facin
>          CEPH_FEATURE_SERVER_JEWEL |            \
>          CEPH_FEATURE_MON_STATEFUL_SUB |        \
>          CEPH_FEATURE_CRUSH_TUNABLES5 |         \
> -        CEPH_FEATURE_NEW_OSDOPREPLY_ENCODING)
> +        CEPH_FEATURE_NEW_OSDOPREPLY_ENCODING | \
> +        CEPH_FEATURE_MDS_QUOTA)
>
>  #define CEPH_FEATURES_REQUIRED_DEFAULT   \
>         (CEPH_FEATURE_NOSRCADDR |        \
> diff --git a/include/linux/ceph/ceph_fs.h b/include/linux/ceph/ceph_fs.h
> index 88dd51381aaf..98bdcc0eda3f 100644
> --- a/include/linux/ceph/ceph_fs.h
> +++ b/include/linux/ceph/ceph_fs.h
> @@ -134,6 +134,7 @@ struct ceph_dir_layout {
>  #define CEPH_MSG_CLIENT_LEASE           0x311
>  #define CEPH_MSG_CLIENT_SNAP            0x312
>  #define CEPH_MSG_CLIENT_CAPRELEASE      0x313
> +#define CEPH_MSG_CLIENT_QUOTA          0x314
>
>  /* pool ops */
>  #define CEPH_MSG_POOLOP_REPLY           48
> @@ -807,4 +808,20 @@ struct ceph_mds_snap_realm {
>  } __attribute__ ((packed));
>  /* followed by my snap list, then prior parent snap list */
>
> +/*
> + * quotas
> + */
> +struct ceph_mds_quota {
> +       __le64 ino;             /* ino */
> +       struct ceph_timespec rctime;
> +       __le64 rbytes;          /* dir stats */
> +       __le64 rfiles;
> +       __le64 rsubdirs;
> +       __u8 struct_v;          /* compat */
> +       __u8 struct_compat;
> +       __le32 struct_len;
> +       __le64 max_bytes;       /* quota max. bytes */
> +       __le64 max_files;       /* quota max. files */
> +} __attribute__ ((packed));
> +
>  #endif
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC v2 PATCH 1/4] ceph: add seqlock for snaprealm hierarchy change detection
  2017-12-19  9:22   ` Yan, Zheng
@ 2017-12-19 10:57     ` Luis Henriques
  0 siblings, 0 replies; 9+ messages in thread
From: Luis Henriques @ 2017-12-19 10:57 UTC (permalink / raw)
  To: Yan, Zheng; +Cc: ceph-devel, Jeff Layton, Jan Fajerski

"Yan, Zheng" <ukernel@gmail.com> writes:

> On Mon, Dec 18, 2017 at 11:38 PM, Luis Henriques <lhenriques@suse.com> wrote:
<snip>
>>  /* snap.c */
>> +extern seqlock_t snaprealm_lock;
>> +
>>  struct ceph_snap_realm *ceph_lookup_snap_realm(struct ceph_mds_client *mdsc,
>>                                                u64 ino);
>>  extern void ceph_get_snap_realm(struct ceph_mds_client *mdsc,
>> --
>
> For the above reason, I think we'd better not to introduce the new seq
> lock. Just read lock mdsc->snap_rwsem when walking the snaprealm
> hierarchy.

Thank you for the detailed review of this patch.  I guess my
understanding of the snaprealms handling wasn't quite correct.  Using
the snap_rwsem seems to make sense now after reading your comments and
re-reading the code.

I'll look at the code a bit more and eventually rework this patchset.
Dropping this patch will also allow simplifying patches 3 and 4.

Cheers,
-- 
Luis

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC v2 PATCH 2/4] ceph: quota: add initial infrastructure to support cephfs quotas
  2017-12-19  9:24   ` Yan, Zheng
@ 2017-12-19 10:59     ` Luis Henriques
  0 siblings, 0 replies; 9+ messages in thread
From: Luis Henriques @ 2017-12-19 10:59 UTC (permalink / raw)
  To: Yan, Zheng; +Cc: ceph-devel, Jeff Layton, Jan Fajerski

"Yan, Zheng" <ukernel@gmail.com> writes:
<snip>
>> +       /* lookup inode */
>> +       vino.ino = le64_to_cpu(h->ino);
>> +       vino.snap = CEPH_NOSNAP;
>> +       inode = ceph_find_inode(sb, vino);
>
> check null

Ah, nice catch.  I'll add that check, thanks.

Cheers,
-- 
Luis

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2017-12-19 10:59 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-12-18 15:38 [RFC v2 PATCH 0/4] ceph: kernel client cephfs quota support Luis Henriques
2017-12-18 15:38 ` [RFC v2 PATCH 1/4] ceph: add seqlock for snaprealm hierarchy change detection Luis Henriques
2017-12-19  9:22   ` Yan, Zheng
2017-12-19 10:57     ` Luis Henriques
2017-12-18 15:39 ` [RFC v2 PATCH 2/4] ceph: quota: add initial infrastructure to support cephfs quotas Luis Henriques
2017-12-19  9:24   ` Yan, Zheng
2017-12-19 10:59     ` Luis Henriques
2017-12-18 15:39 ` [RFC v2 PATCH 3/4] ceph: quotas: support for ceph.quota.max_files Luis Henriques
2017-12-18 15:39 ` [RFC v2 PATCH 4/4] ceph: quota: don't allow cross-quota renames Luis Henriques

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.