All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/11] Fixes for mds cluster
@ 2014-01-21  4:15 Yan, Zheng
  2014-01-21  4:15 ` [PATCH 02/11] mds: use ceph_seq_cmp() to compare migrate_seq Yan, Zheng
                   ` (10 more replies)
  0 siblings, 11 replies; 14+ messages in thread
From: Yan, Zheng @ 2014-01-21  4:15 UTC (permalink / raw)
  To: ceph-devel; +Cc: sage, Yan, Zheng

The last 5 patches are client part of protocol change that handles
corner cases of capability import/export. The mds counterpart is at
  https://github.com/ceph/ceph.git wip-mds-cluster2

The rest patches fix -ESTALE and other misc issues with mds cluster

These patches are also at:
  https://github.com/ceph/ceph-client.git wip-mds-cluster

Regards
Yan, Zheng

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 02/11] mds: use ceph_seq_cmp() to compare migrate_seq
  2014-01-21  4:15 [PATCH 00/11] Fixes for mds cluster Yan, Zheng
@ 2014-01-21  4:15 ` Yan, Zheng
  2014-01-21  4:15 ` [PATCH 03/11] ceph: fix cache revoke race Yan, Zheng
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 14+ messages in thread
From: Yan, Zheng @ 2014-01-21  4:15 UTC (permalink / raw)
  To: ceph-devel; +Cc: sage, Yan, Zheng

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
---
 fs/ceph/caps.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
index 1012099..2c39d9f 100644
--- a/fs/ceph/caps.c
+++ b/fs/ceph/caps.c
@@ -628,7 +628,7 @@ retry:
 	cap->cap_id = cap_id;
 	cap->issued = issued;
 	cap->implemented |= issued;
-	if (mseq > cap->mseq)
+	if (ceph_seq_cmp(mseq, cap->mseq) > 0)
 		cap->mds_wanted = wanted;
 	else
 		cap->mds_wanted |= wanted;
-- 
1.8.4.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 03/11] ceph: fix cache revoke race
  2014-01-21  4:15 [PATCH 00/11] Fixes for mds cluster Yan, Zheng
  2014-01-21  4:15 ` [PATCH 02/11] mds: use ceph_seq_cmp() to compare migrate_seq Yan, Zheng
@ 2014-01-21  4:15 ` Yan, Zheng
  2014-01-21  4:15 ` [PATCH 04/11] ceph: fix trim caps Yan, Zheng
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 14+ messages in thread
From: Yan, Zheng @ 2014-01-21  4:15 UTC (permalink / raw)
  To: ceph-devel; +Cc: sage, Yan, Zheng

handle following sequence of events:

- non-auth MDS revokes Fc cap. queue invalidate work
- auth MDS issues Fc cap through request reply. i_rdcache_gen gets
  increased.
- invalidate work runs. it finds i_rdcache_revoking != i_rdcache_gen,
  so it does nothing.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
---
 fs/ceph/caps.c  | 2 +-
 fs/ceph/inode.c | 8 +++++---
 fs/ceph/super.h | 2 ++
 3 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
index 2c39d9f..d2154d6 100644
--- a/fs/ceph/caps.c
+++ b/fs/ceph/caps.c
@@ -816,7 +816,7 @@ int __ceph_caps_revoking_other(struct ceph_inode_info *ci,
 
 	for (p = rb_first(&ci->i_caps); p; p = rb_next(p)) {
 		cap = rb_entry(p, struct ceph_cap, ci_node);
-		if (cap != ocap && __cap_is_valid(cap) &&
+		if (cap != ocap &&
 		    (cap->implemented & ~cap->issued & mask))
 			return 1;
 	}
diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
index a808bfb..3db97ba 100644
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -1466,7 +1466,8 @@ static void ceph_invalidate_work(struct work_struct *work)
 	dout("invalidate_pages %p gen %d revoking %d\n", inode,
 	     ci->i_rdcache_gen, ci->i_rdcache_revoking);
 	if (ci->i_rdcache_revoking != ci->i_rdcache_gen) {
-		/* nevermind! */
+		if (__ceph_caps_revoking_other(ci, NULL, CEPH_CAP_FILE_CACHE))
+			check = 1;
 		spin_unlock(&ci->i_ceph_lock);
 		mutex_unlock(&ci->i_truncate_mutex);
 		goto out;
@@ -1487,13 +1488,14 @@ static void ceph_invalidate_work(struct work_struct *work)
 		dout("invalidate_pages %p gen %d raced, now %d revoking %d\n",
 		     inode, orig_gen, ci->i_rdcache_gen,
 		     ci->i_rdcache_revoking);
+		if (__ceph_caps_revoking_other(ci, NULL, CEPH_CAP_FILE_CACHE))
+			check = 1;
 	}
 	spin_unlock(&ci->i_ceph_lock);
 	mutex_unlock(&ci->i_truncate_mutex);
-
+out:
 	if (check)
 		ceph_check_caps(ci, 0, NULL);
-out:
 	iput(inode);
 }
 
diff --git a/fs/ceph/super.h b/fs/ceph/super.h
index 7fa78a7..891cda8 100644
--- a/fs/ceph/super.h
+++ b/fs/ceph/super.h
@@ -528,6 +528,8 @@ static inline int __ceph_caps_dirty(struct ceph_inode_info *ci)
 }
 extern int __ceph_mark_dirty_caps(struct ceph_inode_info *ci, int mask);
 
+extern int __ceph_caps_revoking_other(struct ceph_inode_info *ci,
+				      struct ceph_cap *ocap, int mask);
 extern int ceph_caps_revoking(struct ceph_inode_info *ci, int mask);
 extern int __ceph_caps_used(struct ceph_inode_info *ci);
 
-- 
1.8.4.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 04/11] ceph: fix trim caps
  2014-01-21  4:15 [PATCH 00/11] Fixes for mds cluster Yan, Zheng
  2014-01-21  4:15 ` [PATCH 02/11] mds: use ceph_seq_cmp() to compare migrate_seq Yan, Zheng
  2014-01-21  4:15 ` [PATCH 03/11] ceph: fix cache revoke race Yan, Zheng
@ 2014-01-21  4:15 ` Yan, Zheng
  2014-01-21  4:15 ` [PATCH 05/11] ceph: handle -ESTALE reply Yan, Zheng
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 14+ messages in thread
From: Yan, Zheng @ 2014-01-21  4:15 UTC (permalink / raw)
  To: ceph-devel; +Cc: sage, Yan, Zheng

- don't trim auth cap if there are flusing caps
- don't trim auth cap if any 'write' cap is wanted
- allow trimming non-auth cap even if the inode is dirty

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
---
 fs/ceph/mds_client.c | 17 +++++++++++------
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index 4a13f6e..73c7943 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -1214,7 +1214,7 @@ static int trim_caps_cb(struct inode *inode, struct ceph_cap *cap, void *arg)
 {
 	struct ceph_mds_session *session = arg;
 	struct ceph_inode_info *ci = ceph_inode(inode);
-	int used, oissued, mine;
+	int used, wanted, oissued, mine;
 
 	if (session->s_trim_caps <= 0)
 		return -1;
@@ -1222,14 +1222,19 @@ static int trim_caps_cb(struct inode *inode, struct ceph_cap *cap, void *arg)
 	spin_lock(&ci->i_ceph_lock);
 	mine = cap->issued | cap->implemented;
 	used = __ceph_caps_used(ci);
+	wanted = __ceph_caps_file_wanted(ci);
 	oissued = __ceph_caps_issued_other(ci, cap);
 
-	dout("trim_caps_cb %p cap %p mine %s oissued %s used %s\n",
+	dout("trim_caps_cb %p cap %p mine %s oissued %s used %s wanted %s\n",
 	     inode, cap, ceph_cap_string(mine), ceph_cap_string(oissued),
-	     ceph_cap_string(used));
-	if (ci->i_dirty_caps)
-		goto out;   /* dirty caps */
-	if ((used & ~oissued) & mine)
+	     ceph_cap_string(used), ceph_cap_string(wanted));
+	if (cap == ci->i_auth_cap) {
+		if (ci->i_dirty_caps | ci->i_flushing_caps)
+			goto out;
+		if ((used | wanted) & CEPH_CAP_ANY_WR)
+			goto out;
+	}
+	if ((used | wanted) & ~oissued & mine)
 		goto out;   /* we need these caps */
 
 	session->s_trim_caps--;
-- 
1.8.4.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 05/11] ceph: handle -ESTALE reply
  2014-01-21  4:15 [PATCH 00/11] Fixes for mds cluster Yan, Zheng
                   ` (2 preceding siblings ...)
  2014-01-21  4:15 ` [PATCH 04/11] ceph: fix trim caps Yan, Zheng
@ 2014-01-21  4:15 ` Yan, Zheng
  2014-01-21  4:15 ` [PATCH 06/11] ceph: check inode caps in ceph_d_revalidate Yan, Zheng
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 14+ messages in thread
From: Yan, Zheng @ 2014-01-21  4:15 UTC (permalink / raw)
  To: ceph-devel; +Cc: sage, Yan, Zheng

Send requests that operate on path to directory's auth MDS if
mode == USE_AUTH_MDS. Always retry using the auth MDS if got
-ESTALE reply from non-auth MDS. Also clean up the code that
handles auth MDS change.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
---
 fs/ceph/mds_client.c | 31 +++++++++++--------------------
 1 file changed, 11 insertions(+), 20 deletions(-)

diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index 73c7943..1fd655a 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -713,14 +713,15 @@ static int __choose_mds(struct ceph_mds_client *mdsc,
 			struct dentry *dn = get_nonsnap_parent(parent);
 			inode = dn->d_inode;
 			dout("__choose_mds using nonsnap parent %p\n", inode);
-		} else if (req->r_dentry->d_inode) {
+		} else {
 			/* dentry target */
 			inode = req->r_dentry->d_inode;
-		} else {
-			/* dir + name */
-			inode = dir;
-			hash = ceph_dentry_hash(dir, req->r_dentry);
-			is_hash = true;
+			if (!inode || mode == USE_AUTH_MDS) {
+				/* dir + name */
+				inode = dir;
+				hash = ceph_dentry_hash(dir, req->r_dentry);
+				is_hash = true;
+			}
 		}
 	}
 
@@ -2161,26 +2162,16 @@ static void handle_reply(struct ceph_mds_session *session, struct ceph_msg *msg)
 	 */
 	if (result == -ESTALE) {
 		dout("got ESTALE on request %llu", req->r_tid);
-		if (!req->r_inode) {
-			/* do nothing; not an authority problem */
-		} else if (req->r_direct_mode != USE_AUTH_MDS) {
+		if (req->r_direct_mode != USE_AUTH_MDS) {
 			dout("not using auth, setting for that now");
 			req->r_direct_mode = USE_AUTH_MDS;
 			__do_request(mdsc, req);
 			mutex_unlock(&mdsc->mutex);
 			goto out;
 		} else  {
-			struct ceph_inode_info *ci = ceph_inode(req->r_inode);
-			struct ceph_cap *cap = NULL;
-
-			if (req->r_session)
-				cap = ceph_get_cap_for_mds(ci,
-						   req->r_session->s_mds);
-
-			dout("already using auth");
-			if ((!cap || cap != ci->i_auth_cap) ||
-			    (cap->mseq != req->r_sent_on_mseq)) {
-				dout("but cap changed, so resending");
+			int mds = __choose_mds(mdsc, req);
+			if (mds >= 0 && mds != req->r_session->s_mds) {
+				dout("but auth changed, so resending");
 				__do_request(mdsc, req);
 				mutex_unlock(&mdsc->mutex);
 				goto out;
-- 
1.8.4.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 06/11] ceph: check inode caps in ceph_d_revalidate
  2014-01-21  4:15 [PATCH 00/11] Fixes for mds cluster Yan, Zheng
                   ` (3 preceding siblings ...)
  2014-01-21  4:15 ` [PATCH 05/11] ceph: handle -ESTALE reply Yan, Zheng
@ 2014-01-21  4:15 ` Yan, Zheng
  2014-01-21  4:15 ` [PATCH 07/11] ceph: handle session flush message Yan, Zheng
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 14+ messages in thread
From: Yan, Zheng @ 2014-01-21  4:15 UTC (permalink / raw)
  To: ceph-devel; +Cc: sage, Yan, Zheng

Some inodes in readdir reply may have no caps. Getattr mds request
for these inodes can return -ESTALE. The fix is consider dentry that
links to inode with no caps as invalid. Invalid dentry causes a
lookup request to send to the mds, the MDS will send caps back.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
---
 fs/ceph/caps.c  | 12 ++++++++++++
 fs/ceph/dir.c   | 11 ++++++++---
 fs/ceph/super.h |  1 +
 3 files changed, 21 insertions(+), 3 deletions(-)

diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
index d2154d6..d65ff33 100644
--- a/fs/ceph/caps.c
+++ b/fs/ceph/caps.c
@@ -891,6 +891,18 @@ static int __ceph_is_any_caps(struct ceph_inode_info *ci)
 	return !RB_EMPTY_ROOT(&ci->i_caps) || ci->i_cap_exporting_mds >= 0;
 }
 
+int ceph_is_any_caps(struct inode *inode)
+{
+	struct ceph_inode_info *ci = ceph_inode(inode);
+	int ret;
+
+	spin_lock(&ci->i_ceph_lock);
+	ret = __ceph_is_any_caps(ci);
+	spin_unlock(&ci->i_ceph_lock);
+
+	return ret;
+}
+
 /*
  * Remove a cap.  Take steps to deal with a racing iterate_session_caps.
  *
diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c
index b629e9d..619616d 100644
--- a/fs/ceph/dir.c
+++ b/fs/ceph/dir.c
@@ -1041,14 +1041,19 @@ static int ceph_d_revalidate(struct dentry *dentry, unsigned int flags)
 		valid = 1;
 	} else if (dentry_lease_is_valid(dentry) ||
 		   dir_lease_is_valid(dir, dentry)) {
-		valid = 1;
+		if (dentry->d_inode)
+			valid = ceph_is_any_caps(dentry->d_inode);
+		else
+			valid = 1;
 	}
 
 	dout("d_revalidate %p %s\n", dentry, valid ? "valid" : "invalid");
-	if (valid)
+	if (valid) {
 		ceph_dentry_lru_touch(dentry);
-	else
+	} else {
+		ceph_dir_clear_complete(dir);
 		d_drop(dentry);
+	}
 	iput(dir);
 	return valid;
 }
diff --git a/fs/ceph/super.h b/fs/ceph/super.h
index 891cda8..a6ba32f 100644
--- a/fs/ceph/super.h
+++ b/fs/ceph/super.h
@@ -782,6 +782,7 @@ extern int ceph_add_cap(struct inode *inode,
 extern void __ceph_remove_cap(struct ceph_cap *cap, bool queue_release);
 extern void ceph_put_cap(struct ceph_mds_client *mdsc,
 			 struct ceph_cap *cap);
+extern int ceph_is_any_caps(struct inode *inode);
 
 extern void __queue_cap_release(struct ceph_mds_session *session, u64 ino,
 				u64 cap_id, u32 migrate_seq, u32 issue_seq);
-- 
1.8.4.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 07/11] ceph: handle session flush message
  2014-01-21  4:15 [PATCH 00/11] Fixes for mds cluster Yan, Zheng
                   ` (4 preceding siblings ...)
  2014-01-21  4:15 ` [PATCH 06/11] ceph: check inode caps in ceph_d_revalidate Yan, Zheng
@ 2014-01-21  4:15 ` Yan, Zheng
  2014-01-21  4:15 ` [PATCH 08/11] ceph: remove exported caps when handling cap import message Yan, Zheng
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 14+ messages in thread
From: Yan, Zheng @ 2014-01-21  4:15 UTC (permalink / raw)
  To: ceph-devel; +Cc: sage, Yan, Zheng

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
---
 fs/ceph/mds_client.c         | 19 +++++++++++++++++++
 fs/ceph/strings.c            |  2 ++
 include/linux/ceph/ceph_fs.h |  2 ++
 3 files changed, 23 insertions(+)

diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index 1fd655a..7c00dd5 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -1137,6 +1137,21 @@ static int send_renew_caps(struct ceph_mds_client *mdsc,
 	return 0;
 }
 
+static int send_flushmsg_ack(struct ceph_mds_client *mdsc,
+			     struct ceph_mds_session *session, u64 seq)
+{
+	struct ceph_msg *msg;
+
+	dout("send_flushmsg_ack to mds%d (%s)s seq %lld\n",
+	     session->s_mds, session_state_name(session->s_state), seq);
+	msg = create_session_msg(CEPH_SESSION_FLUSHMSG_ACK, seq);
+	if (!msg)
+		return -ENOMEM;
+	ceph_con_send(&session->s_con, msg);
+	return 0;
+}
+
+
 /*
  * Note new cap ttl, and any transition from stale -> not stale (fresh?).
  *
@@ -2396,6 +2411,10 @@ static void handle_session(struct ceph_mds_session *session,
 		trim_caps(mdsc, session, le32_to_cpu(h->max_caps));
 		break;
 
+	case CEPH_SESSION_FLUSHMSG:
+		send_flushmsg_ack(mdsc, session, seq);
+		break;
+
 	default:
 		pr_err("mdsc_handle_session bad op %d mds%d\n", op, mds);
 		WARN_ON(1);
diff --git a/fs/ceph/strings.c b/fs/ceph/strings.c
index 89fa4a9..4440f447 100644
--- a/fs/ceph/strings.c
+++ b/fs/ceph/strings.c
@@ -41,6 +41,8 @@ const char *ceph_session_op_name(int op)
 	case CEPH_SESSION_RENEWCAPS: return "renewcaps";
 	case CEPH_SESSION_STALE: return "stale";
 	case CEPH_SESSION_RECALL_STATE: return "recall_state";
+	case CEPH_SESSION_FLUSHMSG: return "flushmsg";
+	case CEPH_SESSION_FLUSHMSG_ACK: return "flushmsg_ack";
 	}
 	return "???";
 }
diff --git a/include/linux/ceph/ceph_fs.h b/include/linux/ceph/ceph_fs.h
index 2ad7b86..26bb587 100644
--- a/include/linux/ceph/ceph_fs.h
+++ b/include/linux/ceph/ceph_fs.h
@@ -282,6 +282,8 @@ enum {
 	CEPH_SESSION_RENEWCAPS,
 	CEPH_SESSION_STALE,
 	CEPH_SESSION_RECALL_STATE,
+	CEPH_SESSION_FLUSHMSG,
+	CEPH_SESSION_FLUSHMSG_ACK,
 };
 
 extern const char *ceph_session_op_name(int op);
-- 
1.8.4.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 08/11] ceph: remove exported caps when handling cap import message
  2014-01-21  4:15 [PATCH 00/11] Fixes for mds cluster Yan, Zheng
                   ` (5 preceding siblings ...)
  2014-01-21  4:15 ` [PATCH 07/11] ceph: handle session flush message Yan, Zheng
@ 2014-01-21  4:15 ` Yan, Zheng
  2014-01-21  5:22   ` Sage Weil
  2014-01-21  4:15 ` [PATCH 09/11] ceph: add open export target session helper Yan, Zheng
                   ` (3 subsequent siblings)
  10 siblings, 1 reply; 14+ messages in thread
From: Yan, Zheng @ 2014-01-21  4:15 UTC (permalink / raw)
  To: ceph-devel; +Cc: sage, Yan, Zheng

Version 3 cap import message includes the ID of the exported
caps. It allow us to remove the exported caps if we still haven't
received the corresponding cap export message.

We remove the exported caps because they are stale, keeping them
can compromise consistence.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
---
 fs/ceph/caps.c               | 73 ++++++++++++++++++++++++++++----------------
 include/linux/ceph/ceph_fs.h | 11 ++++++-
 2 files changed, 56 insertions(+), 28 deletions(-)

diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
index d65ff33..44373dc 100644
--- a/fs/ceph/caps.c
+++ b/fs/ceph/caps.c
@@ -611,6 +611,7 @@ retry:
 		if (ci->i_auth_cap == NULL ||
 		    ceph_seq_cmp(ci->i_auth_cap->mseq, mseq) < 0)
 			ci->i_auth_cap = cap;
+		ci->i_cap_exporting_issued = 0;
 	} else if (ci->i_auth_cap == cap) {
 		ci->i_auth_cap = NULL;
 		spin_lock(&mdsc->cap_dirty_lock);
@@ -2823,10 +2824,12 @@ static void handle_cap_export(struct inode *inode, struct ceph_mds_caps *ex,
  */
 static void handle_cap_import(struct ceph_mds_client *mdsc,
 			      struct inode *inode, struct ceph_mds_caps *im,
+			      struct ceph_mds_cap_peer *ph,
 			      struct ceph_mds_session *session,
 			      void *snaptrace, int snaptrace_len)
 {
 	struct ceph_inode_info *ci = ceph_inode(inode);
+	struct ceph_cap *cap;
 	int mds = session->s_mds;
 	unsigned issued = le32_to_cpu(im->caps);
 	unsigned wanted = le32_to_cpu(im->wanted);
@@ -2834,28 +2837,38 @@ static void handle_cap_import(struct ceph_mds_client *mdsc,
 	unsigned mseq = le32_to_cpu(im->migrate_seq);
 	u64 realmino = le64_to_cpu(im->realm);
 	u64 cap_id = le64_to_cpu(im->cap_id);
+	u64 p_cap_id;
+	int peer;
 
-	if (ci->i_cap_exporting_mds >= 0 &&
-	    ceph_seq_cmp(ci->i_cap_exporting_mseq, mseq) < 0) {
-		dout("handle_cap_import inode %p ci %p mds%d mseq %d"
-		     " - cleared exporting from mds%d\n",
-		     inode, ci, mds, mseq,
-		     ci->i_cap_exporting_mds);
-		ci->i_cap_exporting_issued = 0;
-		ci->i_cap_exporting_mseq = 0;
-		ci->i_cap_exporting_mds = -1;
+	if (ph) {
+		p_cap_id = le64_to_cpu(ph->cap_id);
+		peer = le32_to_cpu(ph->mds);
+	} else {
+		p_cap_id = 0;
+		peer = -1;
+	}
 
-		spin_lock(&mdsc->cap_dirty_lock);
-		if (!list_empty(&ci->i_dirty_item)) {
-			dout(" moving %p back to cap_dirty\n", inode);
-			list_move(&ci->i_dirty_item, &mdsc->cap_dirty);
+	dout("handle_cap_import inode %p ci %p mds%d mseq %d peer %d\n",
+	     inode, ci, mds, mseq, peer);
+
+	spin_lock(&ci->i_ceph_lock);
+	cap = peer >= 0 ? __get_cap_for_mds(ci, peer) : NULL;
+	if (cap && cap->cap_id == p_cap_id) {
+		dout(" remove export cap %p mds%d flags %d\n",
+		     cap, peer, ph->flags);
+		if (ph->flags & CEPH_CAP_FLAG_AUTH) {
+			WARN_ON(cap->seq != le32_to_cpu(ph->seq));
+			WARN_ON(cap->mseq != le32_to_cpu(ph->mseq));
 		}
-		spin_unlock(&mdsc->cap_dirty_lock);
-	} else {
-		dout("handle_cap_import inode %p ci %p mds%d mseq %d\n",
-		     inode, ci, mds, mseq);
+		ci->i_cap_exporting_issued = cap->issued;
+		__ceph_remove_cap(cap, (ph->flags & CEPH_CAP_FLAG_RELEASE));
 	}
 
+	/* make sure we re-request max_size, if necessary */
+	ci->i_wanted_max_size = 0;
+	ci->i_requested_max_size = 0;
+	spin_unlock(&ci->i_ceph_lock);
+
 	down_write(&mdsc->snap_rwsem);
 	ceph_update_snap_trace(mdsc, snaptrace, snaptrace+snaptrace_len,
 			       false);
@@ -2866,11 +2879,6 @@ static void handle_cap_import(struct ceph_mds_client *mdsc,
 	kick_flushing_inode_caps(mdsc, session, inode);
 	up_read(&mdsc->snap_rwsem);
 
-	/* make sure we re-request max_size, if necessary */
-	spin_lock(&ci->i_ceph_lock);
-	ci->i_wanted_max_size = 0;  /* reset */
-	ci->i_requested_max_size = 0;
-	spin_unlock(&ci->i_ceph_lock);
 }
 
 /*
@@ -2888,6 +2896,7 @@ void ceph_handle_caps(struct ceph_mds_session *session,
 	struct ceph_inode_info *ci;
 	struct ceph_cap *cap;
 	struct ceph_mds_caps *h;
+	struct ceph_mds_cap_peer *peer = NULL;
 	int mds = session->s_mds;
 	int op;
 	u32 seq, mseq;
@@ -2898,12 +2907,14 @@ void ceph_handle_caps(struct ceph_mds_session *session,
 	void *snaptrace;
 	size_t snaptrace_len;
 	void *flock;
+	void *end;
 	u32 flock_len;
 	int open_target_sessions = 0;
 
 	dout("handle_caps from mds%d\n", mds);
 
 	/* decode */
+	end = msg->front.iov_base + msg->front.iov_len;
 	tid = le64_to_cpu(msg->hdr.tid);
 	if (msg->front.iov_len < sizeof(*h))
 		goto bad;
@@ -2921,17 +2932,25 @@ void ceph_handle_caps(struct ceph_mds_session *session,
 	snaptrace_len = le32_to_cpu(h->snap_trace_len);
 
 	if (le16_to_cpu(msg->hdr.version) >= 2) {
-		void *p, *end;
-
-		p = snaptrace + snaptrace_len;
-		end = msg->front.iov_base + msg->front.iov_len;
+		void *p = snaptrace + snaptrace_len;
 		ceph_decode_32_safe(&p, end, flock_len, bad);
+		if (p + flock_len > end)
+			goto bad;
 		flock = p;
 	} else {
 		flock = NULL;
 		flock_len = 0;
 	}
 
+	if (le16_to_cpu(msg->hdr.version) >= 3) {
+		if (op == CEPH_CAP_OP_IMPORT) {
+			void *p = flock + flock_len;
+			if (p + sizeof(*peer) > end)
+				goto bad;
+			peer = p;
+		}
+	}
+
 	mutex_lock(&session->s_mutex);
 	session->s_seq++;
 	dout(" mds%d seq %lld cap seq %u\n", session->s_mds, session->s_seq,
@@ -2968,7 +2987,7 @@ void ceph_handle_caps(struct ceph_mds_session *session,
 		goto done;
 
 	case CEPH_CAP_OP_IMPORT:
-		handle_cap_import(mdsc, inode, h, session,
+		handle_cap_import(mdsc, inode, h, peer, session,
 				  snaptrace, snaptrace_len);
 	}
 
diff --git a/include/linux/ceph/ceph_fs.h b/include/linux/ceph/ceph_fs.h
index 26bb587..0a37b98 100644
--- a/include/linux/ceph/ceph_fs.h
+++ b/include/linux/ceph/ceph_fs.h
@@ -459,7 +459,8 @@ struct ceph_mds_reply_cap {
 	__u8 flags;                    /* CEPH_CAP_FLAG_* */
 } __attribute__ ((packed));
 
-#define CEPH_CAP_FLAG_AUTH  1          /* cap is issued by auth mds */
+#define CEPH_CAP_FLAG_AUTH	(1 << 0)  /* cap is issued by auth mds */
+#define CEPH_CAP_FLAG_RELEASE	(1 << 1)  /* release the cap */
 
 /* inode record, for bundling with mds reply */
 struct ceph_mds_reply_inode {
@@ -660,6 +661,14 @@ struct ceph_mds_caps {
 	__le32 time_warp_seq;
 } __attribute__ ((packed));
 
+struct ceph_mds_cap_peer {
+	__le64 cap_id;
+	__le32 seq;
+	__le32 mseq;
+	__le32 mds;
+	__u8   flags;
+} __attribute__ ((packed));
+
 /* cap release msg head */
 struct ceph_mds_cap_release {
 	__le32 num;                /* number of cap_items that follow */
-- 
1.8.4.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 09/11] ceph: add open export target session helper
  2014-01-21  4:15 [PATCH 00/11] Fixes for mds cluster Yan, Zheng
                   ` (6 preceding siblings ...)
  2014-01-21  4:15 ` [PATCH 08/11] ceph: remove exported caps when handling cap import message Yan, Zheng
@ 2014-01-21  4:15 ` Yan, Zheng
  2014-01-21  4:15 ` [PATCH 10/11] ceph: add imported caps when handling cap export message Yan, Zheng
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 14+ messages in thread
From: Yan, Zheng @ 2014-01-21  4:15 UTC (permalink / raw)
  To: ceph-devel; +Cc: sage, Yan, Zheng

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
---
 fs/ceph/mds_client.c | 51 ++++++++++++++++++++++++++++++++++++---------------
 fs/ceph/mds_client.h |  2 ++
 2 files changed, 38 insertions(+), 15 deletions(-)

diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index 7c00dd5..f4f050a 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -847,35 +847,56 @@ static int __open_session(struct ceph_mds_client *mdsc,
  *
  * called under mdsc->mutex
  */
+static struct ceph_mds_session *
+__open_export_target_session(struct ceph_mds_client *mdsc, int target)
+{
+	struct ceph_mds_session *session;
+
+	session = __ceph_lookup_mds_session(mdsc, target);
+	if (!session) {
+		session = register_session(mdsc, target);
+		if (IS_ERR(session))
+			return session;
+	}
+	if (session->s_state == CEPH_MDS_SESSION_NEW ||
+	    session->s_state == CEPH_MDS_SESSION_CLOSING)
+		__open_session(mdsc, session);
+
+	return session;
+}
+
+struct ceph_mds_session *
+ceph_mdsc_open_export_target_session(struct ceph_mds_client *mdsc, int target)
+{
+	struct ceph_mds_session *session;
+
+	dout("open_export_target_session to mds%d\n", target);
+
+	mutex_lock(&mdsc->mutex);
+	session = __open_export_target_session(mdsc, target);
+	mutex_unlock(&mdsc->mutex);
+
+	return session;
+}
+
 static void __open_export_target_sessions(struct ceph_mds_client *mdsc,
 					  struct ceph_mds_session *session)
 {
 	struct ceph_mds_info *mi;
 	struct ceph_mds_session *ts;
 	int i, mds = session->s_mds;
-	int target;
 
 	if (mds >= mdsc->mdsmap->m_max_mds)
 		return;
+
 	mi = &mdsc->mdsmap->m_info[mds];
 	dout("open_export_target_sessions for mds%d (%d targets)\n",
 	     session->s_mds, mi->num_export_targets);
 
 	for (i = 0; i < mi->num_export_targets; i++) {
-		target = mi->export_targets[i];
-		ts = __ceph_lookup_mds_session(mdsc, target);
-		if (!ts) {
-			ts = register_session(mdsc, target);
-			if (IS_ERR(ts))
-				return;
-		}
-		if (session->s_state == CEPH_MDS_SESSION_NEW ||
-		    session->s_state == CEPH_MDS_SESSION_CLOSING)
-			__open_session(mdsc, session);
-		else
-			dout(" mds%d target mds%d %p is %s\n", session->s_mds,
-			     i, ts, session_state_name(ts->s_state));
-		ceph_put_mds_session(ts);
+		ts = __open_export_target_session(mdsc, mi->export_targets[i]);
+		if (!IS_ERR(ts))
+			ceph_put_mds_session(ts);
 	}
 }
 
diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h
index 4c053d0..6828891 100644
--- a/fs/ceph/mds_client.h
+++ b/fs/ceph/mds_client.h
@@ -383,6 +383,8 @@ extern void ceph_mdsc_lease_send_msg(struct ceph_mds_session *session,
 extern void ceph_mdsc_handle_map(struct ceph_mds_client *mdsc,
 				 struct ceph_msg *msg);
 
+extern struct ceph_mds_session *
+ceph_mdsc_open_export_target_session(struct ceph_mds_client *mdsc, int target);
 extern void ceph_mdsc_open_export_target_sessions(struct ceph_mds_client *mdsc,
 					  struct ceph_mds_session *session);
 
-- 
1.8.4.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 10/11] ceph: add imported caps when handling cap export message
  2014-01-21  4:15 [PATCH 00/11] Fixes for mds cluster Yan, Zheng
                   ` (7 preceding siblings ...)
  2014-01-21  4:15 ` [PATCH 09/11] ceph: add open export target session helper Yan, Zheng
@ 2014-01-21  4:15 ` Yan, Zheng
  2014-01-21  4:15 ` [PATCH 11/11] libceph: support CEPH_FEATURE_EXPORT_PEER Yan, Zheng
  2014-01-21  5:25 ` [PATCH 00/11] Fixes for mds cluster Sage Weil
  10 siblings, 0 replies; 14+ messages in thread
From: Yan, Zheng @ 2014-01-21  4:15 UTC (permalink / raw)
  To: ceph-devel; +Cc: sage, Yan, Zheng

Version 3 cap export message includes information about the imported
caps. It allows us to add the imported caps if the corresponding cap
import message still hasn't been received.

This allow us to handle situation that the importer MDS crashes and
the cap import message is missing.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
---
 fs/ceph/caps.c  | 220 ++++++++++++++++++++++++++++++++++++--------------------
 fs/ceph/inode.c |   4 +-
 fs/ceph/super.h |   4 +-
 3 files changed, 146 insertions(+), 82 deletions(-)

diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
index 44373dc..18c1dc3 100644
--- a/fs/ceph/caps.c
+++ b/fs/ceph/caps.c
@@ -555,21 +555,34 @@ retry:
 		cap->ci = ci;
 		__insert_cap_node(ci, cap);
 
-		/* clear out old exporting info?  (i.e. on cap import) */
-		if (ci->i_cap_exporting_mds == mds) {
-			ci->i_cap_exporting_issued = 0;
-			ci->i_cap_exporting_mseq = 0;
-			ci->i_cap_exporting_mds = -1;
-		}
-
 		/* add to session cap list */
 		cap->session = session;
 		spin_lock(&session->s_cap_lock);
 		list_add_tail(&cap->session_caps, &session->s_caps);
 		session->s_nr_caps++;
 		spin_unlock(&session->s_cap_lock);
-	} else if (new_cap)
-		ceph_put_cap(mdsc, new_cap);
+	} else {
+		if (new_cap)
+			ceph_put_cap(mdsc, new_cap);
+
+		/*
+		 * auth mds of the inode changed. we received the cap export
+		 * message, but still haven't received the cap import message.
+		 * handle_cap_export() updated the new auth MDS' cap.
+		 *
+		 * "ceph_seq_cmp(seq, cap->seq) <= 0" means we are processing
+		 * a message that was send before the cap import message. So
+		 * don't remove caps.
+		 */
+		if (ceph_seq_cmp(seq, cap->seq) <= 0) {
+			WARN_ON(cap != ci->i_auth_cap);
+			WARN_ON(cap->cap_id != cap_id);
+			seq = cap->seq;
+			mseq = cap->mseq;
+			issued |= cap->issued;
+			flags |= CEPH_CAP_FLAG_AUTH;
+		}
+	}
 
 	if (!ci->i_snap_realm) {
 		/*
@@ -612,15 +625,8 @@ retry:
 		    ceph_seq_cmp(ci->i_auth_cap->mseq, mseq) < 0)
 			ci->i_auth_cap = cap;
 		ci->i_cap_exporting_issued = 0;
-	} else if (ci->i_auth_cap == cap) {
-		ci->i_auth_cap = NULL;
-		spin_lock(&mdsc->cap_dirty_lock);
-		if (!list_empty(&ci->i_dirty_item)) {
-			dout(" moving %p to cap_dirty_migrating\n", inode);
-			list_move(&ci->i_dirty_item,
-				  &mdsc->cap_dirty_migrating);
-		}
-		spin_unlock(&mdsc->cap_dirty_lock);
+	} else {
+		WARN_ON(ci->i_auth_cap == cap);
 	}
 
 	dout("add_cap inode %p (%llx.%llx) cap %p %s now %s seq %d mds%d\n",
@@ -889,7 +895,7 @@ int __ceph_caps_mds_wanted(struct ceph_inode_info *ci)
  */
 static int __ceph_is_any_caps(struct ceph_inode_info *ci)
 {
-	return !RB_EMPTY_ROOT(&ci->i_caps) || ci->i_cap_exporting_mds >= 0;
+	return !RB_EMPTY_ROOT(&ci->i_caps) || ci->i_cap_exporting_issued;
 }
 
 int ceph_is_any_caps(struct inode *inode)
@@ -1396,13 +1402,10 @@ int __ceph_mark_dirty_caps(struct ceph_inode_info *ci, int mask)
 				ci->i_snap_realm->cached_context);
 		dout(" inode %p now dirty snapc %p auth cap %p\n",
 		     &ci->vfs_inode, ci->i_head_snapc, ci->i_auth_cap);
+		WARN_ON(!ci->i_auth_cap);
 		BUG_ON(!list_empty(&ci->i_dirty_item));
 		spin_lock(&mdsc->cap_dirty_lock);
-		if (ci->i_auth_cap)
-			list_add(&ci->i_dirty_item, &mdsc->cap_dirty);
-		else
-			list_add(&ci->i_dirty_item,
-				 &mdsc->cap_dirty_migrating);
+		list_add(&ci->i_dirty_item, &mdsc->cap_dirty);
 		spin_unlock(&mdsc->cap_dirty_lock);
 		if (ci->i_flushing_caps == 0) {
 			ihold(inode);
@@ -2421,6 +2424,22 @@ static void handle_cap_grant(struct inode *inode, struct ceph_mds_caps *grant,
 	dout(" size %llu max_size %llu, i_size %llu\n", size, max_size,
 		inode->i_size);
 
+
+	/*
+	 * auth mds of the inode changed. we received the cap export message,
+	 * but still haven't received the cap import message. handle_cap_export
+	 * updated the new auth MDS' cap.
+	 *
+	 * "ceph_seq_cmp(seq, cap->seq) <= 0" means we are processing a message
+	 * that was sent before the cap import message. So don't remove caps.
+	 */
+	if (ceph_seq_cmp(seq, cap->seq) <= 0) {
+		WARN_ON(cap != ci->i_auth_cap);
+		WARN_ON(cap->cap_id != le64_to_cpu(grant->cap_id));
+		seq = cap->seq;
+		newcaps |= cap->issued;
+	}
+
 	/*
 	 * If CACHE is being revoked, and we have no dirty buffers,
 	 * try to invalidate (once).  (If there are dirty buffers, we
@@ -2447,6 +2466,7 @@ static void handle_cap_grant(struct inode *inode, struct ceph_mds_caps *grant,
 	issued |= implemented | __ceph_caps_dirty(ci);
 
 	cap->cap_gen = session->s_cap_gen;
+	cap->seq = seq;
 
 	__check_cap_issue(ci, cap, newcaps);
 
@@ -2497,6 +2517,10 @@ static void handle_cap_grant(struct inode *inode, struct ceph_mds_caps *grant,
 			    le32_to_cpu(grant->time_warp_seq), &ctime, &mtime,
 			    &atime);
 
+
+	/* file layout may have changed */
+	ci->i_layout = grant->layout;
+
 	/* max size increase? */
 	if (ci->i_auth_cap == cap && max_size != ci->i_max_size) {
 		dout("max_size %lld -> %llu\n", ci->i_max_size, max_size);
@@ -2525,11 +2549,6 @@ static void handle_cap_grant(struct inode *inode, struct ceph_mds_caps *grant,
 			check_caps = 1;
 	}
 
-	cap->seq = seq;
-
-	/* file layout may have changed */
-	ci->i_layout = grant->layout;
-
 	/* revocation, grant, or no-op? */
 	if (cap->issued & ~newcaps) {
 		int revoking = cap->issued & ~newcaps;
@@ -2755,65 +2774,114 @@ static void handle_cap_trunc(struct inode *inode,
  * caller holds s_mutex
  */
 static void handle_cap_export(struct inode *inode, struct ceph_mds_caps *ex,
-			      struct ceph_mds_session *session,
-			      int *open_target_sessions)
+			      struct ceph_mds_cap_peer *ph,
+			      struct ceph_mds_session *session)
 {
 	struct ceph_mds_client *mdsc = ceph_inode_to_client(inode)->mdsc;
+	struct ceph_mds_session *tsession = NULL;
+	struct ceph_cap *cap, *tcap;
 	struct ceph_inode_info *ci = ceph_inode(inode);
-	int mds = session->s_mds;
+	u64 t_cap_id;
 	unsigned mseq = le32_to_cpu(ex->migrate_seq);
-	struct ceph_cap *cap = NULL, *t;
-	struct rb_node *p;
-	int remember = 1;
+	unsigned t_seq, t_mseq;
+	int target, issued;
+	int mds = session->s_mds;
 
-	dout("handle_cap_export inode %p ci %p mds%d mseq %d\n",
-	     inode, ci, mds, mseq);
+	if (ph) {
+		t_cap_id = le64_to_cpu(ph->cap_id);
+		t_seq = le32_to_cpu(ph->seq);
+		t_mseq = le32_to_cpu(ph->mseq);
+		target = le32_to_cpu(ph->mds);
+	} else {
+		t_cap_id = t_seq = t_mseq = 0;
+		target = -1;
+	}
 
+	dout("handle_cap_export inode %p ci %p mds%d mseq %d target %d\n",
+	     inode, ci, mds, mseq, target);
+retry:
 	spin_lock(&ci->i_ceph_lock);
+	cap = __get_cap_for_mds(ci, mds);
+	if (!cap)
+		goto out_unlock;
 
-	/* make sure we haven't seen a higher mseq */
-	for (p = rb_first(&ci->i_caps); p; p = rb_next(p)) {
-		t = rb_entry(p, struct ceph_cap, ci_node);
-		if (ceph_seq_cmp(t->mseq, mseq) > 0) {
-			dout(" higher mseq on cap from mds%d\n",
-			     t->session->s_mds);
-			remember = 0;
-		}
-		if (t->session->s_mds == mds)
-			cap = t;
+	if (target < 0) {
+		__ceph_remove_cap(cap, false);
+		goto out_unlock;
 	}
 
-	if (cap) {
-		if (remember) {
-			/* make note */
-			ci->i_cap_exporting_mds = mds;
-			ci->i_cap_exporting_mseq = mseq;
-			ci->i_cap_exporting_issued = cap->issued;
-
-			/*
-			 * make sure we have open sessions with all possible
-			 * export targets, so that we get the matching IMPORT
-			 */
-			*open_target_sessions = 1;
+	/*
+	 * now we know we haven't received the cap import message yet
+	 * because the exported cap still exist.
+	 */
 
-			/*
-			 * we can't flush dirty caps that we've seen the
-			 * EXPORT but no IMPORT for
-			 */
-			spin_lock(&mdsc->cap_dirty_lock);
-			if (!list_empty(&ci->i_dirty_item)) {
-				dout(" moving %p to cap_dirty_migrating\n",
-				     inode);
-				list_move(&ci->i_dirty_item,
-					  &mdsc->cap_dirty_migrating);
+	issued = cap->issued;
+	WARN_ON(issued != cap->implemented);
+
+	tcap = __get_cap_for_mds(ci, target);
+	if (tcap) {
+		/* already have caps from the target */
+		if (tcap->cap_id != t_cap_id ||
+		    ceph_seq_cmp(tcap->seq, t_seq) < 0) {
+			dout(" updating import cap %p mds%d\n", tcap, target);
+			tcap->cap_id = t_cap_id;
+			tcap->seq = t_seq - 1;
+			tcap->issue_seq = t_seq - 1;
+			tcap->mseq = t_mseq;
+			tcap->issued |= issued;
+			tcap->implemented |= issued;
+			if (cap == ci->i_auth_cap)
+				ci->i_auth_cap = tcap;
+			if (ci->i_flushing_caps && ci->i_auth_cap == tcap) {
+				spin_lock(&mdsc->cap_dirty_lock);
+				list_move_tail(&ci->i_flushing_item,
+					       &tcap->session->s_cap_flushing);
+				spin_unlock(&mdsc->cap_dirty_lock);
 			}
-			spin_unlock(&mdsc->cap_dirty_lock);
 		}
 		__ceph_remove_cap(cap, false);
+		goto out_unlock;
+	}
+
+	if (tsession) {
+		int flag = (cap == ci->i_auth_cap) ? CEPH_CAP_FLAG_AUTH : 0;
+		spin_unlock(&ci->i_ceph_lock);
+		/* add placeholder for the export tagert */
+		ceph_add_cap(inode, tsession, t_cap_id, -1, issued, 0,
+			     t_seq - 1, t_mseq, (u64)-1, flag, NULL);
+		goto retry;
 	}
-	/* else, we already released it */
 
 	spin_unlock(&ci->i_ceph_lock);
+	mutex_unlock(&session->s_mutex);
+
+	/* open target session */
+	tsession = ceph_mdsc_open_export_target_session(mdsc, target);
+	if (!IS_ERR(tsession)) {
+		if (mds > target) {
+			mutex_lock(&session->s_mutex);
+			mutex_lock_nested(&tsession->s_mutex,
+					  SINGLE_DEPTH_NESTING);
+		} else {
+			mutex_lock(&tsession->s_mutex);
+			mutex_lock_nested(&session->s_mutex,
+					  SINGLE_DEPTH_NESTING);
+		}
+		ceph_add_cap_releases(mdsc, tsession);
+	} else {
+		WARN_ON(1);
+		tsession = NULL;
+		target = -1;
+	}
+	goto retry;
+
+out_unlock:
+	spin_unlock(&ci->i_ceph_lock);
+	mutex_unlock(&session->s_mutex);
+	if (tsession) {
+		mutex_unlock(&tsession->s_mutex);
+		ceph_put_mds_session(tsession);
+	}
 }
 
 /*
@@ -2909,7 +2977,6 @@ void ceph_handle_caps(struct ceph_mds_session *session,
 	void *flock;
 	void *end;
 	u32 flock_len;
-	int open_target_sessions = 0;
 
 	dout("handle_caps from mds%d\n", mds);
 
@@ -2948,6 +3015,9 @@ void ceph_handle_caps(struct ceph_mds_session *session,
 			if (p + sizeof(*peer) > end)
 				goto bad;
 			peer = p;
+		} else if (op == CEPH_CAP_OP_EXPORT) {
+			/* recorded in unused fields */
+			peer = (void *)&h->size;
 		}
 	}
 
@@ -2983,8 +3053,8 @@ void ceph_handle_caps(struct ceph_mds_session *session,
 		goto done;
 
 	case CEPH_CAP_OP_EXPORT:
-		handle_cap_export(inode, h, session, &open_target_sessions);
-		goto done;
+		handle_cap_export(inode, h, peer, session);
+		goto done_unlocked;
 
 	case CEPH_CAP_OP_IMPORT:
 		handle_cap_import(mdsc, inode, h, peer, session,
@@ -3039,8 +3109,6 @@ done:
 done_unlocked:
 	if (inode)
 		iput(inode);
-	if (open_target_sessions)
-		ceph_mdsc_open_export_target_sessions(mdsc, session);
 	return;
 
 bad:
diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
index 3db97ba..6fc10a7 100644
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -336,12 +336,10 @@ struct inode *ceph_alloc_inode(struct super_block *sb)
 	ci->i_hold_caps_min = 0;
 	ci->i_hold_caps_max = 0;
 	INIT_LIST_HEAD(&ci->i_cap_delay_list);
-	ci->i_cap_exporting_mds = 0;
-	ci->i_cap_exporting_mseq = 0;
-	ci->i_cap_exporting_issued = 0;
 	INIT_LIST_HEAD(&ci->i_cap_snaps);
 	ci->i_head_snapc = NULL;
 	ci->i_snap_caps = 0;
+	ci->i_cap_exporting_issued = 0;
 
 	for (i = 0; i < CEPH_FILE_MODE_NUM; i++)
 		ci->i_nr_by_mode[i] = 0;
diff --git a/fs/ceph/super.h b/fs/ceph/super.h
index a6ba32f..c299f7d 100644
--- a/fs/ceph/super.h
+++ b/fs/ceph/super.h
@@ -287,14 +287,12 @@ struct ceph_inode_info {
 	unsigned long i_hold_caps_min; /* jiffies */
 	unsigned long i_hold_caps_max; /* jiffies */
 	struct list_head i_cap_delay_list;  /* for delayed cap release to mds */
-	int i_cap_exporting_mds;         /* to handle cap migration between */
-	unsigned i_cap_exporting_mseq;   /*  mds's. */
-	unsigned i_cap_exporting_issued;
 	struct ceph_cap_reservation i_cap_migration_resv;
 	struct list_head i_cap_snaps;   /* snapped state pending flush to mds */
 	struct ceph_snap_context *i_head_snapc;  /* set if wr_buffer_head > 0 or
 						    dirty|flushing caps */
 	unsigned i_snap_caps;           /* cap bits for snapped files */
+	unsigned i_cap_exporting_issued;
 
 	int i_nr_by_mode[CEPH_FILE_MODE_NUM];  /* open file counts */
 
-- 
1.8.4.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 11/11] libceph: support CEPH_FEATURE_EXPORT_PEER
  2014-01-21  4:15 [PATCH 00/11] Fixes for mds cluster Yan, Zheng
                   ` (8 preceding siblings ...)
  2014-01-21  4:15 ` [PATCH 10/11] ceph: add imported caps when handling cap export message Yan, Zheng
@ 2014-01-21  4:15 ` Yan, Zheng
  2014-01-21  5:25 ` [PATCH 00/11] Fixes for mds cluster Sage Weil
  10 siblings, 0 replies; 14+ messages in thread
From: Yan, Zheng @ 2014-01-21  4:15 UTC (permalink / raw)
  To: ceph-devel; +Cc: sage, Yan, Zheng

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
---
 include/linux/ceph/ceph_features.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/linux/ceph/ceph_features.h b/include/linux/ceph/ceph_features.h
index 5f42e44..d21b34d 100644
--- a/include/linux/ceph/ceph_features.h
+++ b/include/linux/ceph/ceph_features.h
@@ -80,7 +80,8 @@ static inline u64 ceph_sanitize_features(u64 features)
 	 CEPH_FEATURE_CRUSH_TUNABLES2 |		\
 	 CEPH_FEATURE_REPLY_CREATE_INODE |	\
 	 CEPH_FEATURE_OSDHASHPSPOOL |		\
-	 CEPH_FEATURE_CRUSH_V2)
+	 CEPH_FEATURE_CRUSH_V2 |		\
+	 CEPH_FEATURE_EXPORT_PEER)
 
 #define CEPH_FEATURES_REQUIRED_DEFAULT   \
 	(CEPH_FEATURE_NOSRCADDR |	 \
-- 
1.8.4.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 08/11] ceph: remove exported caps when handling cap import message
  2014-01-21  4:15 ` [PATCH 08/11] ceph: remove exported caps when handling cap import message Yan, Zheng
@ 2014-01-21  5:22   ` Sage Weil
  2014-01-21  6:49     ` Yan, Zheng
  0 siblings, 1 reply; 14+ messages in thread
From: Sage Weil @ 2014-01-21  5:22 UTC (permalink / raw)
  To: Yan, Zheng; +Cc: ceph-devel

On Tue, 21 Jan 2014, Yan, Zheng wrote:
> Version 3 cap import message includes the ID of the exported
> caps. It allow us to remove the exported caps if we still haven't
> received the corresponding cap export message.
> 
> We remove the exported caps because they are stale, keeping them
> can compromise consistence.

Was there any testing with this with the new client and old mds?  It 
obviously will suffer from this bug, but ideally it should handle a basic 
non-racy migration..

> Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
> ---
>  fs/ceph/caps.c               | 73 ++++++++++++++++++++++++++++----------------
>  include/linux/ceph/ceph_fs.h | 11 ++++++-
>  2 files changed, 56 insertions(+), 28 deletions(-)
> 
> diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
> index d65ff33..44373dc 100644
> --- a/fs/ceph/caps.c
> +++ b/fs/ceph/caps.c
> @@ -611,6 +611,7 @@ retry:
>  		if (ci->i_auth_cap == NULL ||
>  		    ceph_seq_cmp(ci->i_auth_cap->mseq, mseq) < 0)
>  			ci->i_auth_cap = cap;
> +		ci->i_cap_exporting_issued = 0;
>  	} else if (ci->i_auth_cap == cap) {
>  		ci->i_auth_cap = NULL;
>  		spin_lock(&mdsc->cap_dirty_lock);
> @@ -2823,10 +2824,12 @@ static void handle_cap_export(struct inode *inode, struct ceph_mds_caps *ex,
>   */
>  static void handle_cap_import(struct ceph_mds_client *mdsc,
>  			      struct inode *inode, struct ceph_mds_caps *im,
> +			      struct ceph_mds_cap_peer *ph,
>  			      struct ceph_mds_session *session,
>  			      void *snaptrace, int snaptrace_len)
>  {
>  	struct ceph_inode_info *ci = ceph_inode(inode);
> +	struct ceph_cap *cap;
>  	int mds = session->s_mds;
>  	unsigned issued = le32_to_cpu(im->caps);
>  	unsigned wanted = le32_to_cpu(im->wanted);
> @@ -2834,28 +2837,38 @@ static void handle_cap_import(struct ceph_mds_client *mdsc,
>  	unsigned mseq = le32_to_cpu(im->migrate_seq);
>  	u64 realmino = le64_to_cpu(im->realm);
>  	u64 cap_id = le64_to_cpu(im->cap_id);
> +	u64 p_cap_id;
> +	int peer;
>  
> -	if (ci->i_cap_exporting_mds >= 0 &&
> -	    ceph_seq_cmp(ci->i_cap_exporting_mseq, mseq) < 0) {
> -		dout("handle_cap_import inode %p ci %p mds%d mseq %d"
> -		     " - cleared exporting from mds%d\n",
> -		     inode, ci, mds, mseq,
> -		     ci->i_cap_exporting_mds);
> -		ci->i_cap_exporting_issued = 0;
> -		ci->i_cap_exporting_mseq = 0;
> -		ci->i_cap_exporting_mds = -1;
> +	if (ph) {
> +		p_cap_id = le64_to_cpu(ph->cap_id);
> +		peer = le32_to_cpu(ph->mds);
> +	} else {
> +		p_cap_id = 0;
> +		peer = -1;
> +	}
>  
> -		spin_lock(&mdsc->cap_dirty_lock);
> -		if (!list_empty(&ci->i_dirty_item)) {
> -			dout(" moving %p back to cap_dirty\n", inode);
> -			list_move(&ci->i_dirty_item, &mdsc->cap_dirty);
> +	dout("handle_cap_import inode %p ci %p mds%d mseq %d peer %d\n",
> +	     inode, ci, mds, mseq, peer);
> +
> +	spin_lock(&ci->i_ceph_lock);
> +	cap = peer >= 0 ? __get_cap_for_mds(ci, peer) : NULL;
> +	if (cap && cap->cap_id == p_cap_id) {
> +		dout(" remove export cap %p mds%d flags %d\n",
> +		     cap, peer, ph->flags);
> +		if (ph->flags & CEPH_CAP_FLAG_AUTH) {
> +			WARN_ON(cap->seq != le32_to_cpu(ph->seq));
> +			WARN_ON(cap->mseq != le32_to_cpu(ph->mseq));
>  		}
> -		spin_unlock(&mdsc->cap_dirty_lock);
> -	} else {
> -		dout("handle_cap_import inode %p ci %p mds%d mseq %d\n",
> -		     inode, ci, mds, mseq);
> +		ci->i_cap_exporting_issued = cap->issued;
> +		__ceph_remove_cap(cap, (ph->flags & CEPH_CAP_FLAG_RELEASE));
>  	}
>  
> +	/* make sure we re-request max_size, if necessary */
> +	ci->i_wanted_max_size = 0;
> +	ci->i_requested_max_size = 0;
> +	spin_unlock(&ci->i_ceph_lock);
> +
>  	down_write(&mdsc->snap_rwsem);
>  	ceph_update_snap_trace(mdsc, snaptrace, snaptrace+snaptrace_len,
>  			       false);
> @@ -2866,11 +2879,6 @@ static void handle_cap_import(struct ceph_mds_client *mdsc,
>  	kick_flushing_inode_caps(mdsc, session, inode);
>  	up_read(&mdsc->snap_rwsem);
>  
> -	/* make sure we re-request max_size, if necessary */
> -	spin_lock(&ci->i_ceph_lock);
> -	ci->i_wanted_max_size = 0;  /* reset */
> -	ci->i_requested_max_size = 0;
> -	spin_unlock(&ci->i_ceph_lock);
>  }
>  
>  /*
> @@ -2888,6 +2896,7 @@ void ceph_handle_caps(struct ceph_mds_session *session,
>  	struct ceph_inode_info *ci;
>  	struct ceph_cap *cap;
>  	struct ceph_mds_caps *h;
> +	struct ceph_mds_cap_peer *peer = NULL;
>  	int mds = session->s_mds;
>  	int op;
>  	u32 seq, mseq;
> @@ -2898,12 +2907,14 @@ void ceph_handle_caps(struct ceph_mds_session *session,
>  	void *snaptrace;
>  	size_t snaptrace_len;
>  	void *flock;
> +	void *end;
>  	u32 flock_len;
>  	int open_target_sessions = 0;
>  
>  	dout("handle_caps from mds%d\n", mds);
>  
>  	/* decode */
> +	end = msg->front.iov_base + msg->front.iov_len;
>  	tid = le64_to_cpu(msg->hdr.tid);
>  	if (msg->front.iov_len < sizeof(*h))
>  		goto bad;
> @@ -2921,17 +2932,25 @@ void ceph_handle_caps(struct ceph_mds_session *session,
>  	snaptrace_len = le32_to_cpu(h->snap_trace_len);
>  
>  	if (le16_to_cpu(msg->hdr.version) >= 2) {
> -		void *p, *end;
> -
> -		p = snaptrace + snaptrace_len;
> -		end = msg->front.iov_base + msg->front.iov_len;
> +		void *p = snaptrace + snaptrace_len;
>  		ceph_decode_32_safe(&p, end, flock_len, bad);
> +		if (p + flock_len > end)
> +			goto bad;
>  		flock = p;
>  	} else {
>  		flock = NULL;
>  		flock_len = 0;
>  	}
>  
> +	if (le16_to_cpu(msg->hdr.version) >= 3) {
> +		if (op == CEPH_CAP_OP_IMPORT) {
> +			void *p = flock + flock_len;
> +			if (p + sizeof(*peer) > end)
> +				goto bad;
> +			peer = p;
> +		}
> +	}
> +
>  	mutex_lock(&session->s_mutex);
>  	session->s_seq++;
>  	dout(" mds%d seq %lld cap seq %u\n", session->s_mds, session->s_seq,
> @@ -2968,7 +2987,7 @@ void ceph_handle_caps(struct ceph_mds_session *session,
>  		goto done;
>  
>  	case CEPH_CAP_OP_IMPORT:
> -		handle_cap_import(mdsc, inode, h, session,
> +		handle_cap_import(mdsc, inode, h, peer, session,
>  				  snaptrace, snaptrace_len);
>  	}
>  
> diff --git a/include/linux/ceph/ceph_fs.h b/include/linux/ceph/ceph_fs.h
> index 26bb587..0a37b98 100644
> --- a/include/linux/ceph/ceph_fs.h
> +++ b/include/linux/ceph/ceph_fs.h
> @@ -459,7 +459,8 @@ struct ceph_mds_reply_cap {
>  	__u8 flags;                    /* CEPH_CAP_FLAG_* */
>  } __attribute__ ((packed));
>  
> -#define CEPH_CAP_FLAG_AUTH  1          /* cap is issued by auth mds */
> +#define CEPH_CAP_FLAG_AUTH	(1 << 0)  /* cap is issued by auth mds */
> +#define CEPH_CAP_FLAG_RELEASE	(1 << 1)  /* release the cap */
>  
>  /* inode record, for bundling with mds reply */
>  struct ceph_mds_reply_inode {
> @@ -660,6 +661,14 @@ struct ceph_mds_caps {
>  	__le32 time_warp_seq;
>  } __attribute__ ((packed));
>  
> +struct ceph_mds_cap_peer {
> +	__le64 cap_id;
> +	__le32 seq;
> +	__le32 mseq;
> +	__le32 mds;
> +	__u8   flags;
> +} __attribute__ ((packed));
> +
>  /* cap release msg head */
>  struct ceph_mds_cap_release {
>  	__le32 num;                /* number of cap_items that follow */
> -- 
> 1.8.4.2
> 
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 00/11] Fixes for mds cluster
  2014-01-21  4:15 [PATCH 00/11] Fixes for mds cluster Yan, Zheng
                   ` (9 preceding siblings ...)
  2014-01-21  4:15 ` [PATCH 11/11] libceph: support CEPH_FEATURE_EXPORT_PEER Yan, Zheng
@ 2014-01-21  5:25 ` Sage Weil
  10 siblings, 0 replies; 14+ messages in thread
From: Sage Weil @ 2014-01-21  5:25 UTC (permalink / raw)
  To: Yan, Zheng; +Cc: ceph-devel

It seems that 1 is missing?

I reviewed 2-9 and 11; ran out of steam on 10.  If this is holding up to 
your stress testing, though, I suggested pushing it to the testing branch 
so the nightly can run against it.

Thanks!
sage


On Tue, 21 Jan 2014, Yan, Zheng wrote:

> The last 5 patches are client part of protocol change that handles
> corner cases of capability import/export. The mds counterpart is at
>   https://github.com/ceph/ceph.git wip-mds-cluster2
> 
> The rest patches fix -ESTALE and other misc issues with mds cluster
> 
> These patches are also at:
>   https://github.com/ceph/ceph-client.git wip-mds-cluster
> 
> Regards
> Yan, Zheng
> 
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 08/11] ceph: remove exported caps when handling cap import message
  2014-01-21  5:22   ` Sage Weil
@ 2014-01-21  6:49     ` Yan, Zheng
  0 siblings, 0 replies; 14+ messages in thread
From: Yan, Zheng @ 2014-01-21  6:49 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel

On 01/21/2014 01:22 PM, Sage Weil wrote:
> On Tue, 21 Jan 2014, Yan, Zheng wrote:
>> Version 3 cap import message includes the ID of the exported
>> caps. It allow us to remove the exported caps if we still haven't
>> received the corresponding cap export message.
>>
>> We remove the exported caps because they are stale, keeping them
>> can compromise consistence.
> 
> Was there any testing with this with the new client and old mds?  It 
> obviously will suffer from this bug, but ideally it should handle a basic 
> non-racy migration..

I did run test with old mds. The only behavior change for old mds is that
exporting caps are not remenbered. If client receives cap export message,
then receives cap import message, there is a period of time corresponding
inode has no cap. Inode with no cap is not good, but I don't think it causes
any big issue.

Regards
Yan, Zheng


> 
>> Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
>> ---
>>  fs/ceph/caps.c               | 73 ++++++++++++++++++++++++++++----------------
>>  include/linux/ceph/ceph_fs.h | 11 ++++++-
>>  2 files changed, 56 insertions(+), 28 deletions(-)
>>
>> diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
>> index d65ff33..44373dc 100644
>> --- a/fs/ceph/caps.c
>> +++ b/fs/ceph/caps.c
>> @@ -611,6 +611,7 @@ retry:
>>  		if (ci->i_auth_cap == NULL ||
>>  		    ceph_seq_cmp(ci->i_auth_cap->mseq, mseq) < 0)
>>  			ci->i_auth_cap = cap;
>> +		ci->i_cap_exporting_issued = 0;
>>  	} else if (ci->i_auth_cap == cap) {
>>  		ci->i_auth_cap = NULL;
>>  		spin_lock(&mdsc->cap_dirty_lock);
>> @@ -2823,10 +2824,12 @@ static void handle_cap_export(struct inode *inode, struct ceph_mds_caps *ex,
>>   */
>>  static void handle_cap_import(struct ceph_mds_client *mdsc,
>>  			      struct inode *inode, struct ceph_mds_caps *im,
>> +			      struct ceph_mds_cap_peer *ph,
>>  			      struct ceph_mds_session *session,
>>  			      void *snaptrace, int snaptrace_len)
>>  {
>>  	struct ceph_inode_info *ci = ceph_inode(inode);
>> +	struct ceph_cap *cap;
>>  	int mds = session->s_mds;
>>  	unsigned issued = le32_to_cpu(im->caps);
>>  	unsigned wanted = le32_to_cpu(im->wanted);
>> @@ -2834,28 +2837,38 @@ static void handle_cap_import(struct ceph_mds_client *mdsc,
>>  	unsigned mseq = le32_to_cpu(im->migrate_seq);
>>  	u64 realmino = le64_to_cpu(im->realm);
>>  	u64 cap_id = le64_to_cpu(im->cap_id);
>> +	u64 p_cap_id;
>> +	int peer;
>>  
>> -	if (ci->i_cap_exporting_mds >= 0 &&
>> -	    ceph_seq_cmp(ci->i_cap_exporting_mseq, mseq) < 0) {
>> -		dout("handle_cap_import inode %p ci %p mds%d mseq %d"
>> -		     " - cleared exporting from mds%d\n",
>> -		     inode, ci, mds, mseq,
>> -		     ci->i_cap_exporting_mds);
>> -		ci->i_cap_exporting_issued = 0;
>> -		ci->i_cap_exporting_mseq = 0;
>> -		ci->i_cap_exporting_mds = -1;
>> +	if (ph) {
>> +		p_cap_id = le64_to_cpu(ph->cap_id);
>> +		peer = le32_to_cpu(ph->mds);
>> +	} else {
>> +		p_cap_id = 0;
>> +		peer = -1;
>> +	}
>>  
>> -		spin_lock(&mdsc->cap_dirty_lock);
>> -		if (!list_empty(&ci->i_dirty_item)) {
>> -			dout(" moving %p back to cap_dirty\n", inode);
>> -			list_move(&ci->i_dirty_item, &mdsc->cap_dirty);
>> +	dout("handle_cap_import inode %p ci %p mds%d mseq %d peer %d\n",
>> +	     inode, ci, mds, mseq, peer);
>> +
>> +	spin_lock(&ci->i_ceph_lock);
>> +	cap = peer >= 0 ? __get_cap_for_mds(ci, peer) : NULL;
>> +	if (cap && cap->cap_id == p_cap_id) {
>> +		dout(" remove export cap %p mds%d flags %d\n",
>> +		     cap, peer, ph->flags);
>> +		if (ph->flags & CEPH_CAP_FLAG_AUTH) {
>> +			WARN_ON(cap->seq != le32_to_cpu(ph->seq));
>> +			WARN_ON(cap->mseq != le32_to_cpu(ph->mseq));
>>  		}
>> -		spin_unlock(&mdsc->cap_dirty_lock);
>> -	} else {
>> -		dout("handle_cap_import inode %p ci %p mds%d mseq %d\n",
>> -		     inode, ci, mds, mseq);
>> +		ci->i_cap_exporting_issued = cap->issued;
>> +		__ceph_remove_cap(cap, (ph->flags & CEPH_CAP_FLAG_RELEASE));
>>  	}
>>  
>> +	/* make sure we re-request max_size, if necessary */
>> +	ci->i_wanted_max_size = 0;
>> +	ci->i_requested_max_size = 0;
>> +	spin_unlock(&ci->i_ceph_lock);
>> +
>>  	down_write(&mdsc->snap_rwsem);
>>  	ceph_update_snap_trace(mdsc, snaptrace, snaptrace+snaptrace_len,
>>  			       false);
>> @@ -2866,11 +2879,6 @@ static void handle_cap_import(struct ceph_mds_client *mdsc,
>>  	kick_flushing_inode_caps(mdsc, session, inode);
>>  	up_read(&mdsc->snap_rwsem);
>>  
>> -	/* make sure we re-request max_size, if necessary */
>> -	spin_lock(&ci->i_ceph_lock);
>> -	ci->i_wanted_max_size = 0;  /* reset */
>> -	ci->i_requested_max_size = 0;
>> -	spin_unlock(&ci->i_ceph_lock);
>>  }
>>  
>>  /*
>> @@ -2888,6 +2896,7 @@ void ceph_handle_caps(struct ceph_mds_session *session,
>>  	struct ceph_inode_info *ci;
>>  	struct ceph_cap *cap;
>>  	struct ceph_mds_caps *h;
>> +	struct ceph_mds_cap_peer *peer = NULL;
>>  	int mds = session->s_mds;
>>  	int op;
>>  	u32 seq, mseq;
>> @@ -2898,12 +2907,14 @@ void ceph_handle_caps(struct ceph_mds_session *session,
>>  	void *snaptrace;
>>  	size_t snaptrace_len;
>>  	void *flock;
>> +	void *end;
>>  	u32 flock_len;
>>  	int open_target_sessions = 0;
>>  
>>  	dout("handle_caps from mds%d\n", mds);
>>  
>>  	/* decode */
>> +	end = msg->front.iov_base + msg->front.iov_len;
>>  	tid = le64_to_cpu(msg->hdr.tid);
>>  	if (msg->front.iov_len < sizeof(*h))
>>  		goto bad;
>> @@ -2921,17 +2932,25 @@ void ceph_handle_caps(struct ceph_mds_session *session,
>>  	snaptrace_len = le32_to_cpu(h->snap_trace_len);
>>  
>>  	if (le16_to_cpu(msg->hdr.version) >= 2) {
>> -		void *p, *end;
>> -
>> -		p = snaptrace + snaptrace_len;
>> -		end = msg->front.iov_base + msg->front.iov_len;
>> +		void *p = snaptrace + snaptrace_len;
>>  		ceph_decode_32_safe(&p, end, flock_len, bad);
>> +		if (p + flock_len > end)
>> +			goto bad;
>>  		flock = p;
>>  	} else {
>>  		flock = NULL;
>>  		flock_len = 0;
>>  	}
>>  
>> +	if (le16_to_cpu(msg->hdr.version) >= 3) {
>> +		if (op == CEPH_CAP_OP_IMPORT) {
>> +			void *p = flock + flock_len;
>> +			if (p + sizeof(*peer) > end)
>> +				goto bad;
>> +			peer = p;
>> +		}
>> +	}
>> +
>>  	mutex_lock(&session->s_mutex);
>>  	session->s_seq++;
>>  	dout(" mds%d seq %lld cap seq %u\n", session->s_mds, session->s_seq,
>> @@ -2968,7 +2987,7 @@ void ceph_handle_caps(struct ceph_mds_session *session,
>>  		goto done;
>>  
>>  	case CEPH_CAP_OP_IMPORT:
>> -		handle_cap_import(mdsc, inode, h, session,
>> +		handle_cap_import(mdsc, inode, h, peer, session,
>>  				  snaptrace, snaptrace_len);
>>  	}
>>  
>> diff --git a/include/linux/ceph/ceph_fs.h b/include/linux/ceph/ceph_fs.h
>> index 26bb587..0a37b98 100644
>> --- a/include/linux/ceph/ceph_fs.h
>> +++ b/include/linux/ceph/ceph_fs.h
>> @@ -459,7 +459,8 @@ struct ceph_mds_reply_cap {
>>  	__u8 flags;                    /* CEPH_CAP_FLAG_* */
>>  } __attribute__ ((packed));
>>  
>> -#define CEPH_CAP_FLAG_AUTH  1          /* cap is issued by auth mds */
>> +#define CEPH_CAP_FLAG_AUTH	(1 << 0)  /* cap is issued by auth mds */
>> +#define CEPH_CAP_FLAG_RELEASE	(1 << 1)  /* release the cap */
>>  
>>  /* inode record, for bundling with mds reply */
>>  struct ceph_mds_reply_inode {
>> @@ -660,6 +661,14 @@ struct ceph_mds_caps {
>>  	__le32 time_warp_seq;
>>  } __attribute__ ((packed));
>>  
>> +struct ceph_mds_cap_peer {
>> +	__le64 cap_id;
>> +	__le32 seq;
>> +	__le32 mseq;
>> +	__le32 mds;
>> +	__u8   flags;
>> +} __attribute__ ((packed));
>> +
>>  /* cap release msg head */
>>  struct ceph_mds_cap_release {
>>  	__le32 num;                /* number of cap_items that follow */
>> -- 
>> 1.8.4.2
>>
>>


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2014-01-21  6:49 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-01-21  4:15 [PATCH 00/11] Fixes for mds cluster Yan, Zheng
2014-01-21  4:15 ` [PATCH 02/11] mds: use ceph_seq_cmp() to compare migrate_seq Yan, Zheng
2014-01-21  4:15 ` [PATCH 03/11] ceph: fix cache revoke race Yan, Zheng
2014-01-21  4:15 ` [PATCH 04/11] ceph: fix trim caps Yan, Zheng
2014-01-21  4:15 ` [PATCH 05/11] ceph: handle -ESTALE reply Yan, Zheng
2014-01-21  4:15 ` [PATCH 06/11] ceph: check inode caps in ceph_d_revalidate Yan, Zheng
2014-01-21  4:15 ` [PATCH 07/11] ceph: handle session flush message Yan, Zheng
2014-01-21  4:15 ` [PATCH 08/11] ceph: remove exported caps when handling cap import message Yan, Zheng
2014-01-21  5:22   ` Sage Weil
2014-01-21  6:49     ` Yan, Zheng
2014-01-21  4:15 ` [PATCH 09/11] ceph: add open export target session helper Yan, Zheng
2014-01-21  4:15 ` [PATCH 10/11] ceph: add imported caps when handling cap export message Yan, Zheng
2014-01-21  4:15 ` [PATCH 11/11] libceph: support CEPH_FEATURE_EXPORT_PEER Yan, Zheng
2014-01-21  5:25 ` [PATCH 00/11] Fixes for mds cluster Sage Weil

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.