All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/6] fixes for cephfs kernel client
@ 2013-01-04 11:34 Yan, Zheng
  2013-01-04 11:34 ` [PATCH 1/6] ceph: re-calculate truncate_size for strip object Yan, Zheng
                   ` (5 more replies)
  0 siblings, 6 replies; 22+ messages in thread
From: Yan, Zheng @ 2013-01-04 11:34 UTC (permalink / raw)
  To: ceph-devel, sage; +Cc: Yan, Zheng

From: "Yan, Zheng" <zheng.z.yan@intel.com>

These patches are client side modification for multiple MDS setup.
I'm not quite sure if patch 6 is correct, please review it.

These patches are also in:
  git://github.com/ukernel/ceph-client.git wip-ceph

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 1/6] ceph: re-calculate truncate_size for strip object
  2013-01-04 11:34 [PATCH 0/6] fixes for cephfs kernel client Yan, Zheng
@ 2013-01-04 11:34 ` Yan, Zheng
  2013-01-06  6:04   ` Sage Weil
  2013-01-04 11:34 ` [PATCH 2/6] ceph: move dirty inode to migrating list when clearing auth caps Yan, Zheng
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 22+ messages in thread
From: Yan, Zheng @ 2013-01-04 11:34 UTC (permalink / raw)
  To: ceph-devel, sage; +Cc: Yan, Zheng

From: "Yan, Zheng" <zheng.z.yan@intel.com>

Otherwise osd may truncate the object to larger size.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
---
 net/ceph/osd_client.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
index eb9a444..267f183 100644
--- a/net/ceph/osd_client.c
+++ b/net/ceph/osd_client.c
@@ -76,8 +76,16 @@ int ceph_calc_raw_layout(struct ceph_osd_client *osdc,
 		     orig_len - *plen, off, *plen);
 
 	if (op_has_extent(op->op)) {
+		u32 osize = le32_to_cpu(layout->fl_object_size);
 		op->extent.offset = objoff;
 		op->extent.length = objlen;
+		if (op->extent.truncate_size <= off - objoff) {
+			op->extent.truncate_size = 0;
+		} else {
+			op->extent.truncate_size -= off - objoff;
+			if (op->extent.truncate_size > osize)
+				op->extent.truncate_size = osize;
+		}
 	}
 	req->r_num_pages = calc_pages_for(off, *plen);
 	req->r_page_alignment = off & ~PAGE_MASK;
-- 
1.7.11.7


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 2/6] ceph: move dirty inode to migrating list when clearing auth caps
  2013-01-04 11:34 [PATCH 0/6] fixes for cephfs kernel client Yan, Zheng
  2013-01-04 11:34 ` [PATCH 1/6] ceph: re-calculate truncate_size for strip object Yan, Zheng
@ 2013-01-04 11:34 ` Yan, Zheng
  2013-01-06  6:05   ` Sage Weil
  2013-01-04 11:34 ` [PATCH 3/6] ceph: allow revoking duplicated caps issued by non-auth MDS Yan, Zheng
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 22+ messages in thread
From: Yan, Zheng @ 2013-01-04 11:34 UTC (permalink / raw)
  To: ceph-devel, sage; +Cc: Yan, Zheng

From: "Yan, Zheng" <zheng.z.yan@intel.com>

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
---
 fs/ceph/caps.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
index a1d9bb3..a9fe2d5 100644
--- a/fs/ceph/caps.c
+++ b/fs/ceph/caps.c
@@ -611,8 +611,16 @@ retry:
 
 	if (flags & CEPH_CAP_FLAG_AUTH)
 		ci->i_auth_cap = cap;
-	else if (ci->i_auth_cap == cap)
+	else if (ci->i_auth_cap == cap) {
 		ci->i_auth_cap = NULL;
+		spin_lock(&mdsc->cap_dirty_lock);
+		if (!list_empty(&ci->i_dirty_item)) {
+			dout(" moving %p to cap_dirty_migrating\n", inode);
+			list_move(&ci->i_dirty_item,
+				  &mdsc->cap_dirty_migrating);
+		}
+		spin_unlock(&mdsc->cap_dirty_lock);
+	}
 
 	dout("add_cap inode %p (%llx.%llx) cap %p %s now %s seq %d mds%d\n",
 	     inode, ceph_vinop(inode), cap, ceph_cap_string(issued),
-- 
1.7.11.7


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 3/6] ceph: allow revoking duplicated caps issued by non-auth MDS
  2013-01-04 11:34 [PATCH 0/6] fixes for cephfs kernel client Yan, Zheng
  2013-01-04 11:34 ` [PATCH 1/6] ceph: re-calculate truncate_size for strip object Yan, Zheng
  2013-01-04 11:34 ` [PATCH 2/6] ceph: move dirty inode to migrating list when clearing auth caps Yan, Zheng
@ 2013-01-04 11:34 ` Yan, Zheng
  2013-01-06  6:09   ` Sage Weil
  2013-01-04 11:34 ` [PATCH 4/6] ceph: allocate cap_release message when receiving cap import Yan, Zheng
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 22+ messages in thread
From: Yan, Zheng @ 2013-01-04 11:34 UTC (permalink / raw)
  To: ceph-devel, sage; +Cc: Yan, Zheng

From: "Yan, Zheng" <zheng.z.yan@intel.com>

Allow revoking duplicated caps issued by non-auth MDS if these caps
are also issued by auth MDS.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
---
 fs/ceph/caps.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
index a9fe2d5..c90b245 100644
--- a/fs/ceph/caps.c
+++ b/fs/ceph/caps.c
@@ -1468,7 +1468,7 @@ void ceph_check_caps(struct ceph_inode_info *ci, int flags,
 	struct ceph_mds_client *mdsc = fsc->mdsc;
 	struct inode *inode = &ci->vfs_inode;
 	struct ceph_cap *cap;
-	int file_wanted, used;
+	int file_wanted, used, cap_used;
 	int took_snap_rwsem = 0;             /* true if mdsc->snap_rwsem held */
 	int issued, implemented, want, retain, revoking, flushing = 0;
 	int mds = -1;   /* keep track of how far we've gone through i_caps list
@@ -1577,6 +1577,10 @@ retry_locked:
 		     ceph_cap_string(cap->implemented),
 		     ceph_cap_string(revoking));
 
+		cap_used = used;
+		if (ci->i_auth_cap && cap != ci->i_auth_cap)
+			cap_used &= ~ci->i_auth_cap->issued;
+
 		if (cap == ci->i_auth_cap &&
 		    (cap->issued & CEPH_CAP_FILE_WR)) {
 			/* request larger max_size from MDS? */
@@ -1601,7 +1605,7 @@ retry_locked:
 		}
 
 		/* completed revocation? going down and there are no caps? */
-		if (revoking && (revoking & used) == 0) {
+		if (revoking && (revoking & cap_used) == 0) {
 			dout("completed revocation of %s\n",
 			     ceph_cap_string(cap->implemented & ~cap->issued));
 			goto ack;
@@ -1678,8 +1682,8 @@ ack:
 		sent++;
 
 		/* __send_cap drops i_ceph_lock */
-		delayed += __send_cap(mdsc, cap, CEPH_CAP_OP_UPDATE, used, want,
-				      retain, flushing, NULL);
+		delayed += __send_cap(mdsc, cap, CEPH_CAP_OP_UPDATE, cap_used,
+				      want, retain, flushing, NULL);
 		goto retry; /* retake i_ceph_lock and restart our cap scan. */
 	}
 
-- 
1.7.11.7


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 4/6] ceph: allocate cap_release message when receiving cap import
  2013-01-04 11:34 [PATCH 0/6] fixes for cephfs kernel client Yan, Zheng
                   ` (2 preceding siblings ...)
  2013-01-04 11:34 ` [PATCH 3/6] ceph: allow revoking duplicated caps issued by non-auth MDS Yan, Zheng
@ 2013-01-04 11:34 ` Yan, Zheng
  2013-01-06  6:12   ` Sage Weil
  2013-01-04 11:34 ` [PATCH 5/6] ceph: check mds_wanted for imported cap Yan, Zheng
  2013-01-04 11:34 ` [PATCH 6/6] ceph: don't acquire i_mutex ceph_vmtruncate_work Yan, Zheng
  5 siblings, 1 reply; 22+ messages in thread
From: Yan, Zheng @ 2013-01-04 11:34 UTC (permalink / raw)
  To: ceph-devel, sage; +Cc: Yan, Zheng

From: "Yan, Zheng" <zheng.z.yan@intel.com>

When client wants to release an imported cap, it's possible there
is no reserved cap_release message in corresponding mds session.
so __queue_cap_release causes kernel panic.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
---
 fs/ceph/caps.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
index c90b245..7e90299 100644
--- a/fs/ceph/caps.c
+++ b/fs/ceph/caps.c
@@ -2832,6 +2832,9 @@ void ceph_handle_caps(struct ceph_mds_session *session,
 	dout(" mds%d seq %lld cap seq %u\n", session->s_mds, session->s_seq,
 	     (unsigned)seq);
 
+	if (op == CEPH_CAP_OP_IMPORT)
+		ceph_add_cap_releases(mdsc, session);
+
 	/* lookup ino */
 	inode = ceph_find_inode(sb, vino);
 	ci = ceph_inode(inode);
-- 
1.7.11.7


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 5/6] ceph: check mds_wanted for imported cap
  2013-01-04 11:34 [PATCH 0/6] fixes for cephfs kernel client Yan, Zheng
                   ` (3 preceding siblings ...)
  2013-01-04 11:34 ` [PATCH 4/6] ceph: allocate cap_release message when receiving cap import Yan, Zheng
@ 2013-01-04 11:34 ` Yan, Zheng
  2013-01-06  6:20   ` Sage Weil
  2013-01-04 11:34 ` [PATCH 6/6] ceph: don't acquire i_mutex ceph_vmtruncate_work Yan, Zheng
  5 siblings, 1 reply; 22+ messages in thread
From: Yan, Zheng @ 2013-01-04 11:34 UTC (permalink / raw)
  To: ceph-devel, sage; +Cc: Yan, Zheng

From: "Yan, Zheng" <zheng.z.yan@intel.com>

The MDS may have incorrect wanted caps after importing caps. So the
client should check the value mds has and send cap update if necessary.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
---
 fs/ceph/caps.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
index 7e90299..16c10f8 100644
--- a/fs/ceph/caps.c
+++ b/fs/ceph/caps.c
@@ -2425,10 +2425,12 @@ static void handle_cap_grant(struct inode *inode, struct ceph_mds_caps *grant,
 	     ceph_cap_string(used),
 	     ceph_cap_string(dirty));
 	if (wanted != le32_to_cpu(grant->wanted)) {
-		dout("mds wanted %s -> %s\n",
-		     ceph_cap_string(le32_to_cpu(grant->wanted)),
-		     ceph_cap_string(wanted));
-		grant->wanted = cpu_to_le32(wanted);
+		dout("mds wanted = %s\n",
+		     ceph_cap_string(le32_to_cpu(grant->wanted)));
+		/* imported cap may not have correct mds_wanted */
+		if (cap == ci->i_auth_cap &&
+		    (wanted & ~(cap->mds_wanted | cap->issued)))
+			check_caps = 1;
 	}
 
 	cap->seq = seq;
-- 
1.7.11.7


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 6/6] ceph: don't acquire i_mutex ceph_vmtruncate_work
  2013-01-04 11:34 [PATCH 0/6] fixes for cephfs kernel client Yan, Zheng
                   ` (4 preceding siblings ...)
  2013-01-04 11:34 ` [PATCH 5/6] ceph: check mds_wanted for imported cap Yan, Zheng
@ 2013-01-04 11:34 ` Yan, Zheng
  2013-01-06  6:00   ` Sage Weil
  5 siblings, 1 reply; 22+ messages in thread
From: Yan, Zheng @ 2013-01-04 11:34 UTC (permalink / raw)
  To: ceph-devel, sage; +Cc: Yan, Zheng

From: "Yan, Zheng" <zheng.z.yan@intel.com>

In commit 22cddde104, ceph_get_caps() was moved into ceph_write_begin().
So ceph_get_caps() can be called while i_mutex is locked. If there is
pending vmtruncate, ceph_get_caps() will wait for it to complete, but
ceph_vmtruncate_work() is blocked by the locked i_mutex.

There are several places that call __ceph_do_pending_vmtruncate()
without holding the i_mutex, I think it's OK to not acquire the i_mutex
in ceph_vmtruncate_work()

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
---
 fs/ceph/inode.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
index 81613bc..c895c7f 100644
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -1420,7 +1420,7 @@ out:
 
 
 /*
- * called by trunc_wq; take i_mutex ourselves
+ * called by trunc_wq;
  *
  * We also truncate in a separate thread as well.
  */
@@ -1431,9 +1431,7 @@ static void ceph_vmtruncate_work(struct work_struct *work)
 	struct inode *inode = &ci->vfs_inode;
 
 	dout("vmtruncate_work %p\n", inode);
-	mutex_lock(&inode->i_mutex);
 	__ceph_do_pending_vmtruncate(inode);
-	mutex_unlock(&inode->i_mutex);
 	iput(inode);
 }
 
-- 
1.7.11.7


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 6/6] ceph: don't acquire i_mutex ceph_vmtruncate_work
  2013-01-04 11:34 ` [PATCH 6/6] ceph: don't acquire i_mutex ceph_vmtruncate_work Yan, Zheng
@ 2013-01-06  6:00   ` Sage Weil
  2013-01-06  7:54     ` Yan, Zheng
  0 siblings, 1 reply; 22+ messages in thread
From: Sage Weil @ 2013-01-06  6:00 UTC (permalink / raw)
  To: Yan, Zheng; +Cc: ceph-devel

On Fri, 4 Jan 2013, Yan, Zheng wrote:
> From: "Yan, Zheng" <zheng.z.yan@intel.com>
> 
> In commit 22cddde104, ceph_get_caps() was moved into ceph_write_begin().
> So ceph_get_caps() can be called while i_mutex is locked. If there is
> pending vmtruncate, ceph_get_caps() will wait for it to complete, but
> ceph_vmtruncate_work() is blocked by the locked i_mutex.

Hmm... :/

> There are several places that call __ceph_do_pending_vmtruncate()
> without holding the i_mutex, I think it's OK to not acquire the i_mutex
> in ceph_vmtruncate_work()

The intention was that that function woudl only be called under i_mutex.  
I did a quick look through the callers and that appears to be the case 
(for things llseek and setattr, the vfs should be taking i_mutex).

IIRC, this is to serialize the page cache truncation with truncate 
operations; this work can only be sanely done under i_mutex, so we defer 
it to the work queue or next person who takes i_mutex and cares about the 
mapping and i_size being consistent.

What was the deadlock you observed?

sage


> 
> Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
> ---
>  fs/ceph/inode.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
> index 81613bc..c895c7f 100644
> --- a/fs/ceph/inode.c
> +++ b/fs/ceph/inode.c
> @@ -1420,7 +1420,7 @@ out:
>  
>  
>  /*
> - * called by trunc_wq; take i_mutex ourselves
> + * called by trunc_wq;
>   *
>   * We also truncate in a separate thread as well.
>   */
> @@ -1431,9 +1431,7 @@ static void ceph_vmtruncate_work(struct work_struct *work)
>  	struct inode *inode = &ci->vfs_inode;
>  
>  	dout("vmtruncate_work %p\n", inode);
> -	mutex_lock(&inode->i_mutex);
>  	__ceph_do_pending_vmtruncate(inode);
> -	mutex_unlock(&inode->i_mutex);
>  	iput(inode);
>  }
>  
> -- 
> 1.7.11.7
> 
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/6] ceph: re-calculate truncate_size for strip object
  2013-01-04 11:34 ` [PATCH 1/6] ceph: re-calculate truncate_size for strip object Yan, Zheng
@ 2013-01-06  6:04   ` Sage Weil
  0 siblings, 0 replies; 22+ messages in thread
From: Sage Weil @ 2013-01-06  6:04 UTC (permalink / raw)
  To: Yan, Zheng; +Cc: ceph-devel

Reviewed-by: Sage Weil <sage@inktank.com>

On Fri, 4 Jan 2013, Yan, Zheng wrote:

> From: "Yan, Zheng" <zheng.z.yan@intel.com>
> 
> Otherwise osd may truncate the object to larger size.
> 
> Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
> ---
>  net/ceph/osd_client.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
> index eb9a444..267f183 100644
> --- a/net/ceph/osd_client.c
> +++ b/net/ceph/osd_client.c
> @@ -76,8 +76,16 @@ int ceph_calc_raw_layout(struct ceph_osd_client *osdc,
>  		     orig_len - *plen, off, *plen);
>  
>  	if (op_has_extent(op->op)) {
> +		u32 osize = le32_to_cpu(layout->fl_object_size);
>  		op->extent.offset = objoff;
>  		op->extent.length = objlen;
> +		if (op->extent.truncate_size <= off - objoff) {
> +			op->extent.truncate_size = 0;
> +		} else {
> +			op->extent.truncate_size -= off - objoff;
> +			if (op->extent.truncate_size > osize)
> +				op->extent.truncate_size = osize;
> +		}
>  	}
>  	req->r_num_pages = calc_pages_for(off, *plen);
>  	req->r_page_alignment = off & ~PAGE_MASK;
> -- 
> 1.7.11.7
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/6] ceph: move dirty inode to migrating list when clearing auth caps
  2013-01-04 11:34 ` [PATCH 2/6] ceph: move dirty inode to migrating list when clearing auth caps Yan, Zheng
@ 2013-01-06  6:05   ` Sage Weil
  0 siblings, 0 replies; 22+ messages in thread
From: Sage Weil @ 2013-01-06  6:05 UTC (permalink / raw)
  To: Yan, Zheng; +Cc: ceph-devel

Reviewed-by: Sage Weil <sage@inktank.com>

On Fri, 4 Jan 2013, Yan, Zheng wrote:

> From: "Yan, Zheng" <zheng.z.yan@intel.com>
> 
> Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
> ---
>  fs/ceph/caps.c | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
> index a1d9bb3..a9fe2d5 100644
> --- a/fs/ceph/caps.c
> +++ b/fs/ceph/caps.c
> @@ -611,8 +611,16 @@ retry:
>  
>  	if (flags & CEPH_CAP_FLAG_AUTH)
>  		ci->i_auth_cap = cap;
> -	else if (ci->i_auth_cap == cap)
> +	else if (ci->i_auth_cap == cap) {
>  		ci->i_auth_cap = NULL;
> +		spin_lock(&mdsc->cap_dirty_lock);
> +		if (!list_empty(&ci->i_dirty_item)) {
> +			dout(" moving %p to cap_dirty_migrating\n", inode);
> +			list_move(&ci->i_dirty_item,
> +				  &mdsc->cap_dirty_migrating);
> +		}
> +		spin_unlock(&mdsc->cap_dirty_lock);
> +	}
>  
>  	dout("add_cap inode %p (%llx.%llx) cap %p %s now %s seq %d mds%d\n",
>  	     inode, ceph_vinop(inode), cap, ceph_cap_string(issued),
> -- 
> 1.7.11.7
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/6] ceph: allow revoking duplicated caps issued by non-auth MDS
  2013-01-04 11:34 ` [PATCH 3/6] ceph: allow revoking duplicated caps issued by non-auth MDS Yan, Zheng
@ 2013-01-06  6:09   ` Sage Weil
  2013-01-06  8:49     ` Yan, Zheng
  0 siblings, 1 reply; 22+ messages in thread
From: Sage Weil @ 2013-01-06  6:09 UTC (permalink / raw)
  To: Yan, Zheng; +Cc: ceph-devel

On Fri, 4 Jan 2013, Yan, Zheng wrote:

> From: "Yan, Zheng" <zheng.z.yan@intel.com>
> 
> Allow revoking duplicated caps issued by non-auth MDS if these caps
> are also issued by auth MDS.
> 
> Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
> ---
>  fs/ceph/caps.c | 12 ++++++++----
>  1 file changed, 8 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
> index a9fe2d5..c90b245 100644
> --- a/fs/ceph/caps.c
> +++ b/fs/ceph/caps.c
> @@ -1468,7 +1468,7 @@ void ceph_check_caps(struct ceph_inode_info *ci, int flags,
>  	struct ceph_mds_client *mdsc = fsc->mdsc;
>  	struct inode *inode = &ci->vfs_inode;
>  	struct ceph_cap *cap;
> -	int file_wanted, used;
> +	int file_wanted, used, cap_used;
>  	int took_snap_rwsem = 0;             /* true if mdsc->snap_rwsem held */
>  	int issued, implemented, want, retain, revoking, flushing = 0;
>  	int mds = -1;   /* keep track of how far we've gone through i_caps list
> @@ -1577,6 +1577,10 @@ retry_locked:
>  		     ceph_cap_string(cap->implemented),
>  		     ceph_cap_string(revoking));
>  
> +		cap_used = used;
> +		if (ci->i_auth_cap && cap != ci->i_auth_cap)
> +			cap_used &= ~ci->i_auth_cap->issued;
> +

Let's move this above the dout line, and include used_caps in that, or 
else the next person to parse the debug output will be very confused.  

Otherwise, looks good!

Reviewed-by: Sage Weil <sage@inktank.com>

>  		if (cap == ci->i_auth_cap &&
>  		    (cap->issued & CEPH_CAP_FILE_WR)) {
>  			/* request larger max_size from MDS? */
> @@ -1601,7 +1605,7 @@ retry_locked:
>  		}
>  
>  		/* completed revocation? going down and there are no caps? */
> -		if (revoking && (revoking & used) == 0) {
> +		if (revoking && (revoking & cap_used) == 0) {
>  			dout("completed revocation of %s\n",
>  			     ceph_cap_string(cap->implemented & ~cap->issued));
>  			goto ack;
> @@ -1678,8 +1682,8 @@ ack:
>  		sent++;
>  
>  		/* __send_cap drops i_ceph_lock */
> -		delayed += __send_cap(mdsc, cap, CEPH_CAP_OP_UPDATE, used, want,
> -				      retain, flushing, NULL);
> +		delayed += __send_cap(mdsc, cap, CEPH_CAP_OP_UPDATE, cap_used,
> +				      want, retain, flushing, NULL);
>  		goto retry; /* retake i_ceph_lock and restart our cap scan. */
>  	}
>  
> -- 
> 1.7.11.7
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/6] ceph: allocate cap_release message when receiving cap import
  2013-01-04 11:34 ` [PATCH 4/6] ceph: allocate cap_release message when receiving cap import Yan, Zheng
@ 2013-01-06  6:12   ` Sage Weil
  0 siblings, 0 replies; 22+ messages in thread
From: Sage Weil @ 2013-01-06  6:12 UTC (permalink / raw)
  To: Yan, Zheng; +Cc: ceph-devel

Reviewed-by: Sage Weil <sage@inktank.com>

This is one of those annoying situations where we may not have 
preallocated the memory and in the message handler may not be able to.  
It should be easy to find when we come back to clean that up, though, 
since almost all ceph_add_cap_releases() are similarly affected.


On Fri, 4 Jan 2013, Yan, Zheng wrote:
> From: "Yan, Zheng" <zheng.z.yan@intel.com>
> 
> When client wants to release an imported cap, it's possible there
> is no reserved cap_release message in corresponding mds session.
> so __queue_cap_release causes kernel panic.
> 
> Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
> ---
>  fs/ceph/caps.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
> index c90b245..7e90299 100644
> --- a/fs/ceph/caps.c
> +++ b/fs/ceph/caps.c
> @@ -2832,6 +2832,9 @@ void ceph_handle_caps(struct ceph_mds_session *session,
>  	dout(" mds%d seq %lld cap seq %u\n", session->s_mds, session->s_seq,
>  	     (unsigned)seq);
>  
> +	if (op == CEPH_CAP_OP_IMPORT)
> +		ceph_add_cap_releases(mdsc, session);
> +
>  	/* lookup ino */
>  	inode = ceph_find_inode(sb, vino);
>  	ci = ceph_inode(inode);
> -- 
> 1.7.11.7
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 5/6] ceph: check mds_wanted for imported cap
  2013-01-04 11:34 ` [PATCH 5/6] ceph: check mds_wanted for imported cap Yan, Zheng
@ 2013-01-06  6:20   ` Sage Weil
  2013-01-06  8:21     ` Yan, Zheng
  2013-01-06  8:54     ` Yan, Zheng
  0 siblings, 2 replies; 22+ messages in thread
From: Sage Weil @ 2013-01-06  6:20 UTC (permalink / raw)
  To: Yan, Zheng; +Cc: ceph-devel

On Fri, 4 Jan 2013, Yan, Zheng wrote:
> From: "Yan, Zheng" <zheng.z.yan@intel.com>
> 
> The MDS may have incorrect wanted caps after importing caps. So the
> client should check the value mds has and send cap update if necessary.
> 
> Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
> ---
>  fs/ceph/caps.c | 10 ++++++----
>  1 file changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
> index 7e90299..16c10f8 100644
> --- a/fs/ceph/caps.c
> +++ b/fs/ceph/caps.c
> @@ -2425,10 +2425,12 @@ static void handle_cap_grant(struct inode *inode, struct ceph_mds_caps *grant,
>  	     ceph_cap_string(used),
>  	     ceph_cap_string(dirty));
>  	if (wanted != le32_to_cpu(grant->wanted)) {
> -		dout("mds wanted %s -> %s\n",
> -		     ceph_cap_string(le32_to_cpu(grant->wanted)),
> -		     ceph_cap_string(wanted));
> -		grant->wanted = cpu_to_le32(wanted);

Doh, this was a holdover from when we used to re-use the incoming message 
and send it back out over the wire.

> +		dout("mds wanted = %s\n",
> +		     ceph_cap_string(le32_to_cpu(grant->wanted)));

Any reason to drop the "old -> new" output?

> +		/* imported cap may not have correct mds_wanted */
> +		if (cap == ci->i_auth_cap &&
> +		    (wanted & ~(cap->mds_wanted | cap->issued)))
> +			check_caps = 1;

I think we want

		if (cap == ci->i_auth_cap)
			check_caps = 1;

If we want more caps, check_caps will request immediately.  If we have 
extra, we'll put the cap on the delay list.

sage


>  	}
>  
>  	cap->seq = seq;
> -- 
> 1.7.11.7
> 
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 6/6] ceph: don't acquire i_mutex ceph_vmtruncate_work
  2013-01-06  6:00   ` Sage Weil
@ 2013-01-06  7:54     ` Yan, Zheng
  2013-01-07  5:05       ` Sage Weil
  0 siblings, 1 reply; 22+ messages in thread
From: Yan, Zheng @ 2013-01-06  7:54 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel

On 01/06/2013 02:00 PM, Sage Weil wrote:
> On Fri, 4 Jan 2013, Yan, Zheng wrote:
>> From: "Yan, Zheng" <zheng.z.yan@intel.com>
>>
>> In commit 22cddde104, ceph_get_caps() was moved into ceph_write_begin().
>> So ceph_get_caps() can be called while i_mutex is locked. If there is
>> pending vmtruncate, ceph_get_caps() will wait for it to complete, but
>> ceph_vmtruncate_work() is blocked by the locked i_mutex.
> 
> Hmm... :/
> 
>> There are several places that call __ceph_do_pending_vmtruncate()
>> without holding the i_mutex, I think it's OK to not acquire the i_mutex
>> in ceph_vmtruncate_work()
> 
> The intention was that that function woudl only be called under i_mutex.  
> I did a quick look through the callers and that appears to be the case 
> (for things llseek and setattr, the vfs should be taking i_mutex).

both ceph_aio_read() and ceph_aio_write() call __ceph_do_pending_vmtruncate()
without holding the i_mutex

> 
> IIRC, this is to serialize the page cache truncation with truncate 
> operations; this work can only be sanely done under i_mutex, so we defer 
> it to the work queue or next person who takes i_mutex and cares about the 
> mapping and i_size being consistent.
> 
> What was the deadlock you observed?

generic_file_aio_write() locks the i_mutex, then indirectly calls ceph_get_caps()
through ceph_write_begin(). ceph_get_caps() wait for pending vmtruncate to complete,
but the work queue is blocked by the i_mutex.

Regards
Yan, Zheng

> 
> sage
> 
> 
>>
>> Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
>> ---
>>  fs/ceph/inode.c | 4 +---
>>  1 file changed, 1 insertion(+), 3 deletions(-)
>>
>> diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
>> index 81613bc..c895c7f 100644
>> --- a/fs/ceph/inode.c
>> +++ b/fs/ceph/inode.c
>> @@ -1420,7 +1420,7 @@ out:
>>  
>>  
>>  /*
>> - * called by trunc_wq; take i_mutex ourselves
>> + * called by trunc_wq;
>>   *
>>   * We also truncate in a separate thread as well.
>>   */
>> @@ -1431,9 +1431,7 @@ static void ceph_vmtruncate_work(struct work_struct *work)
>>  	struct inode *inode = &ci->vfs_inode;
>>  
>>  	dout("vmtruncate_work %p\n", inode);
>> -	mutex_lock(&inode->i_mutex);
>>  	__ceph_do_pending_vmtruncate(inode);
>> -	mutex_unlock(&inode->i_mutex);
>>  	iput(inode);
>>  }
>>  
>> -- 
>> 1.7.11.7
>>
>>


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 5/6] ceph: check mds_wanted for imported cap
  2013-01-06  6:20   ` Sage Weil
@ 2013-01-06  8:21     ` Yan, Zheng
  2013-01-07  5:06       ` Sage Weil
  2013-01-06  8:54     ` Yan, Zheng
  1 sibling, 1 reply; 22+ messages in thread
From: Yan, Zheng @ 2013-01-06  8:21 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel

On 01/06/2013 02:20 PM, Sage Weil wrote:
> On Fri, 4 Jan 2013, Yan, Zheng wrote:
>> From: "Yan, Zheng" <zheng.z.yan@intel.com>
>>
>> The MDS may have incorrect wanted caps after importing caps. So the
>> client should check the value mds has and send cap update if necessary.
>>
>> Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
>> ---
>>  fs/ceph/caps.c | 10 ++++++----
>>  1 file changed, 6 insertions(+), 4 deletions(-)
>>
>> diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
>> index 7e90299..16c10f8 100644
>> --- a/fs/ceph/caps.c
>> +++ b/fs/ceph/caps.c
>> @@ -2425,10 +2425,12 @@ static void handle_cap_grant(struct inode *inode, struct ceph_mds_caps *grant,
>>  	     ceph_cap_string(used),
>>  	     ceph_cap_string(dirty));
>>  	if (wanted != le32_to_cpu(grant->wanted)) {
>> -		dout("mds wanted %s -> %s\n",
>> -		     ceph_cap_string(le32_to_cpu(grant->wanted)),
>> -		     ceph_cap_string(wanted));
>> -		grant->wanted = cpu_to_le32(wanted);
> 
> Doh, this was a holdover from when we used to re-use the incoming message 
> and send it back out over the wire.
> 
>> +		dout("mds wanted = %s\n",
>> +		     ceph_cap_string(le32_to_cpu(grant->wanted)));
> 
> Any reason to drop the "old -> new" output?
> 
will and it back in next patch

>> +		/* imported cap may not have correct mds_wanted */
>> +		if (cap == ci->i_auth_cap &&
>> +		    (wanted & ~(cap->mds_wanted | cap->issued)))
>> +			check_caps = 1;
> 
> I think we want
> 
> 		if (cap == ci->i_auth_cap)
> 			check_caps = 1;
probably, we need

if (le32_to_cpu(grant->op) == CEPH_CAP_OP_IMPORT)
  check_caps = 1;

Regards
Yan, Zheng

> 
> If we want more caps, check_caps will request immediately.  If we have 
> extra, we'll put the cap on the delay list.
> 
> sage
> 
> 
>>  	}
>>  
>>  	cap->seq = seq;
>> -- 
>> 1.7.11.7
>>
>>


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/6] ceph: allow revoking duplicated caps issued by non-auth MDS
  2013-01-06  6:09   ` Sage Weil
@ 2013-01-06  8:49     ` Yan, Zheng
  0 siblings, 0 replies; 22+ messages in thread
From: Yan, Zheng @ 2013-01-06  8:49 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel

how about below patch, I also rebased git://github.com/ukernel/ceph-client.git wip-ceph

----
From 19f0e26cb90c29bb22ad09769bfc0df03b3d07be Mon Sep 17 00:00:00 2001
From: "Yan, Zheng" <zheng.z.yan@intel.com>
Date: Fri, 4 Jan 2013 14:37:57 +0800
Subject: [PATCH 3/5] ceph: allow revoking duplicated caps issued by non-auth
 MDS

Allow revoking duplicated caps issued by non-auth MDS if these caps
are also issued by auth MDS.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
---
 fs/ceph/caps.c | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
index a9fe2d5..76b1923 100644
--- a/fs/ceph/caps.c
+++ b/fs/ceph/caps.c
@@ -1468,7 +1468,7 @@ void ceph_check_caps(struct ceph_inode_info *ci, int flags,
 	struct ceph_mds_client *mdsc = fsc->mdsc;
 	struct inode *inode = &ci->vfs_inode;
 	struct ceph_cap *cap;
-	int file_wanted, used;
+	int file_wanted, used, cap_used;
 	int took_snap_rwsem = 0;             /* true if mdsc->snap_rwsem held */
 	int issued, implemented, want, retain, revoking, flushing = 0;
 	int mds = -1;   /* keep track of how far we've gone through i_caps list
@@ -1571,9 +1571,14 @@ retry_locked:
 
 		/* NOTE: no side-effects allowed, until we take s_mutex */
 
+		cap_used = used;
+		if (ci->i_auth_cap && cap != ci->i_auth_cap)
+			cap_used &= ~ci->i_auth_cap->issued;
+
 		revoking = cap->implemented & ~cap->issued;
-		dout(" mds%d cap %p issued %s implemented %s revoking %s\n",
+		dout(" mds%d cap %p used %s issued %s implemented %s revoking %s\n",
 		     cap->mds, cap, ceph_cap_string(cap->issued),
+		     ceph_cap_string(cap_used),
 		     ceph_cap_string(cap->implemented),
 		     ceph_cap_string(revoking));
 
@@ -1601,7 +1606,7 @@ retry_locked:
 		}
 
 		/* completed revocation? going down and there are no caps? */
-		if (revoking && (revoking & used) == 0) {
+		if (revoking && (revoking & cap_used) == 0) {
 			dout("completed revocation of %s\n",
 			     ceph_cap_string(cap->implemented & ~cap->issued));
 			goto ack;
@@ -1678,8 +1683,8 @@ ack:
 		sent++;
 
 		/* __send_cap drops i_ceph_lock */
-		delayed += __send_cap(mdsc, cap, CEPH_CAP_OP_UPDATE, used, want,
-				      retain, flushing, NULL);
+		delayed += __send_cap(mdsc, cap, CEPH_CAP_OP_UPDATE, cap_used,
+				      want, retain, flushing, NULL);
 		goto retry; /* retake i_ceph_lock and restart our cap scan. */
 	}
 
-- 
1.7.11.7


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 5/6] ceph: check mds_wanted for imported cap
  2013-01-06  6:20   ` Sage Weil
  2013-01-06  8:21     ` Yan, Zheng
@ 2013-01-06  8:54     ` Yan, Zheng
  2013-01-07  5:11       ` Sage Weil
  1 sibling, 1 reply; 22+ messages in thread
From: Yan, Zheng @ 2013-01-06  8:54 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel

how about below patch, also updated git://github.com/ukernel/ceph-client.git wip-ceph

Reagrds
Yan, Zheng
---
From 6fdcf8d71239dac7b26f50623ee5986f39552b4a Mon Sep 17 00:00:00 2001
From: "Yan, Zheng" <zheng.z.yan@intel.com>
Date: Fri, 4 Jan 2013 15:30:10 +0800
Subject: [PATCH 5/5] ceph: check mds_wanted for imported cap

The MDS may have incorrect wanted caps after importing caps. So the
client should check the value mds has and send cap update if necessary.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
---
 fs/ceph/caps.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
index 40b5bbe..1e1e020 100644
--- a/fs/ceph/caps.c
+++ b/fs/ceph/caps.c
@@ -2429,7 +2429,9 @@ static void handle_cap_grant(struct inode *inode, struct ceph_mds_caps *grant,
 		dout("mds wanted %s -> %s\n",
 		     ceph_cap_string(le32_to_cpu(grant->wanted)),
 		     ceph_cap_string(wanted));
-		grant->wanted = cpu_to_le32(wanted);
+		/* imported cap may not have correct mds_wanted */
+		if (le32_to_cpu(grant->op) == CEPH_CAP_OP_IMPORT)
+			check_caps = 1;
 	}
 
 	cap->seq = seq;
-- 
1.7.11.7


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 6/6] ceph: don't acquire i_mutex ceph_vmtruncate_work
  2013-01-06  7:54     ` Yan, Zheng
@ 2013-01-07  5:05       ` Sage Weil
  2013-01-07  5:31         ` Yan, Zheng
  0 siblings, 1 reply; 22+ messages in thread
From: Sage Weil @ 2013-01-07  5:05 UTC (permalink / raw)
  To: Yan, Zheng; +Cc: ceph-devel

On Sun, 6 Jan 2013, Yan, Zheng wrote:
> On 01/06/2013 02:00 PM, Sage Weil wrote:
> > On Fri, 4 Jan 2013, Yan, Zheng wrote:
> >> From: "Yan, Zheng" <zheng.z.yan@intel.com>
> >>
> >> In commit 22cddde104, ceph_get_caps() was moved into ceph_write_begin().
> >> So ceph_get_caps() can be called while i_mutex is locked. If there is
> >> pending vmtruncate, ceph_get_caps() will wait for it to complete, but
> >> ceph_vmtruncate_work() is blocked by the locked i_mutex.
> > 
> > Hmm... :/
> > 
> >> There are several places that call __ceph_do_pending_vmtruncate()
> >> without holding the i_mutex, I think it's OK to not acquire the i_mutex
> >> in ceph_vmtruncate_work()
> > 
> > The intention was that that function woudl only be called under i_mutex.  
> > I did a quick look through the callers and that appears to be the case 
> > (for things llseek and setattr, the vfs should be taking i_mutex).
> 
> both ceph_aio_read() and ceph_aio_write() call __ceph_do_pending_vmtruncate()
> without holding the i_mutex

Hrm.. that's now how I remember it working.  I'm pretty sure i_mutex was 
providing the serialization around truncation.

> > IIRC, this is to serialize the page cache truncation with truncate 
> > operations; this work can only be sanely done under i_mutex, so we defer 
> > it to the work queue or next person who takes i_mutex and cares about the 
> > mapping and i_size being consistent.
> > 
> > What was the deadlock you observed?
> 
> generic_file_aio_write() locks the i_mutex, then indirectly calls ceph_get_caps()
> through ceph_write_begin(). ceph_get_caps() wait for pending vmtruncate to complete,
> but the work queue is blocked by the i_mutex.

...but it seems clear that that's not a workable solution.  I suspect the 
right solution here is to introduce an inner i_trunc_mutex in the ceph 
inode and use that to protect truncation events in the page cache.  That 
will touch a lot of different code paths, unfortunately, but I think 
something like that is necessary if we need to block with i_mutex held by 
the VFS.

sage


> 
> Regards
> Yan, Zheng
> 
> > 
> > sage
> > 
> > 
> >>
> >> Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
> >> ---
> >>  fs/ceph/inode.c | 4 +---
> >>  1 file changed, 1 insertion(+), 3 deletions(-)
> >>
> >> diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
> >> index 81613bc..c895c7f 100644
> >> --- a/fs/ceph/inode.c
> >> +++ b/fs/ceph/inode.c
> >> @@ -1420,7 +1420,7 @@ out:
> >>  
> >>  
> >>  /*
> >> - * called by trunc_wq; take i_mutex ourselves
> >> + * called by trunc_wq;
> >>   *
> >>   * We also truncate in a separate thread as well.
> >>   */
> >> @@ -1431,9 +1431,7 @@ static void ceph_vmtruncate_work(struct work_struct *work)
> >>  	struct inode *inode = &ci->vfs_inode;
> >>  
> >>  	dout("vmtruncate_work %p\n", inode);
> >> -	mutex_lock(&inode->i_mutex);
> >>  	__ceph_do_pending_vmtruncate(inode);
> >> -	mutex_unlock(&inode->i_mutex);
> >>  	iput(inode);
> >>  }
> >>  
> >> -- 
> >> 1.7.11.7
> >>
> >>
> 
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 5/6] ceph: check mds_wanted for imported cap
  2013-01-06  8:21     ` Yan, Zheng
@ 2013-01-07  5:06       ` Sage Weil
  0 siblings, 0 replies; 22+ messages in thread
From: Sage Weil @ 2013-01-07  5:06 UTC (permalink / raw)
  To: Yan, Zheng; +Cc: ceph-devel

On Sun, 6 Jan 2013, Yan, Zheng wrote:
> On 01/06/2013 02:20 PM, Sage Weil wrote:
> > On Fri, 4 Jan 2013, Yan, Zheng wrote:
> >> From: "Yan, Zheng" <zheng.z.yan@intel.com>
> >>
> >> The MDS may have incorrect wanted caps after importing caps. So the
> >> client should check the value mds has and send cap update if necessary.
> >>
> >> Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
> >> ---
> >>  fs/ceph/caps.c | 10 ++++++----
> >>  1 file changed, 6 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
> >> index 7e90299..16c10f8 100644
> >> --- a/fs/ceph/caps.c
> >> +++ b/fs/ceph/caps.c
> >> @@ -2425,10 +2425,12 @@ static void handle_cap_grant(struct inode *inode, struct ceph_mds_caps *grant,
> >>  	     ceph_cap_string(used),
> >>  	     ceph_cap_string(dirty));
> >>  	if (wanted != le32_to_cpu(grant->wanted)) {
> >> -		dout("mds wanted %s -> %s\n",
> >> -		     ceph_cap_string(le32_to_cpu(grant->wanted)),
> >> -		     ceph_cap_string(wanted));
> >> -		grant->wanted = cpu_to_le32(wanted);
> > 
> > Doh, this was a holdover from when we used to re-use the incoming message 
> > and send it back out over the wire.
> > 
> >> +		dout("mds wanted = %s\n",
> >> +		     ceph_cap_string(le32_to_cpu(grant->wanted)));
> > 
> > Any reason to drop the "old -> new" output?
> > 
> will and it back in next patch
> 
> >> +		/* imported cap may not have correct mds_wanted */
> >> +		if (cap == ci->i_auth_cap &&
> >> +		    (wanted & ~(cap->mds_wanted | cap->issued)))
> >> +			check_caps = 1;
> > 
> > I think we want
> > 
> > 		if (cap == ci->i_auth_cap)
> > 			check_caps = 1;
> probably, we need
> 
> if (le32_to_cpu(grant->op) == CEPH_CAP_OP_IMPORT)
>   check_caps = 1;

That sounds right; I can't think offhand why the MDS would have an 
incorrect notion of wanted except on import.

sage


> 
> Regards
> Yan, Zheng
> 
> > 
> > If we want more caps, check_caps will request immediately.  If we have 
> > extra, we'll put the cap on the delay list.
> > 
> > sage
> > 
> > 
> >>  	}
> >>  
> >>  	cap->seq = seq;
> >> -- 
> >> 1.7.11.7
> >>
> >>
> 
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 5/6] ceph: check mds_wanted for imported cap
  2013-01-06  8:54     ` Yan, Zheng
@ 2013-01-07  5:11       ` Sage Weil
  2013-01-07  5:39         ` Yan, Zheng
  0 siblings, 1 reply; 22+ messages in thread
From: Sage Weil @ 2013-01-07  5:11 UTC (permalink / raw)
  To: Yan, Zheng; +Cc: ceph-devel

On Sun, 6 Jan 2013, Yan, Zheng wrote:
> how about below patch, also updated git://github.com/ukernel/ceph-client.git wip-ceph

The branch looks good.  Once Alex sorts out testing-next tomorrow we can 
pull these in.  If you don't mind adding the Reviewed-by tags, that'd be 
helpful, otherwise I'll do it tomorrow.

Thanks!
sage


> 
> Reagrds
> Yan, Zheng
> ---
> >From 6fdcf8d71239dac7b26f50623ee5986f39552b4a Mon Sep 17 00:00:00 2001
> From: "Yan, Zheng" <zheng.z.yan@intel.com>
> Date: Fri, 4 Jan 2013 15:30:10 +0800
> Subject: [PATCH 5/5] ceph: check mds_wanted for imported cap
> 
> The MDS may have incorrect wanted caps after importing caps. So the
> client should check the value mds has and send cap update if necessary.
> 
> Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
> ---
>  fs/ceph/caps.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
> index 40b5bbe..1e1e020 100644
> --- a/fs/ceph/caps.c
> +++ b/fs/ceph/caps.c
> @@ -2429,7 +2429,9 @@ static void handle_cap_grant(struct inode *inode, struct ceph_mds_caps *grant,
>  		dout("mds wanted %s -> %s\n",
>  		     ceph_cap_string(le32_to_cpu(grant->wanted)),
>  		     ceph_cap_string(wanted));
> -		grant->wanted = cpu_to_le32(wanted);
> +		/* imported cap may not have correct mds_wanted */
> +		if (le32_to_cpu(grant->op) == CEPH_CAP_OP_IMPORT)
> +			check_caps = 1;
>  	}
>  
>  	cap->seq = seq;
> -- 
> 1.7.11.7
> 
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 6/6] ceph: don't acquire i_mutex ceph_vmtruncate_work
  2013-01-07  5:05       ` Sage Weil
@ 2013-01-07  5:31         ` Yan, Zheng
  0 siblings, 0 replies; 22+ messages in thread
From: Yan, Zheng @ 2013-01-07  5:31 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel

On 01/07/2013 01:05 PM, Sage Weil wrote:
> On Sun, 6 Jan 2013, Yan, Zheng wrote:
>> On 01/06/2013 02:00 PM, Sage Weil wrote:
>>> On Fri, 4 Jan 2013, Yan, Zheng wrote:
>>>> From: "Yan, Zheng" <zheng.z.yan@intel.com>
>>>>
>>>> In commit 22cddde104, ceph_get_caps() was moved into ceph_write_begin().
>>>> So ceph_get_caps() can be called while i_mutex is locked. If there is
>>>> pending vmtruncate, ceph_get_caps() will wait for it to complete, but
>>>> ceph_vmtruncate_work() is blocked by the locked i_mutex.
>>>
>>> Hmm... :/
>>>
>>>> There are several places that call __ceph_do_pending_vmtruncate()
>>>> without holding the i_mutex, I think it's OK to not acquire the i_mutex
>>>> in ceph_vmtruncate_work()
>>>
>>> The intention was that that function woudl only be called under i_mutex.  
>>> I did a quick look through the callers and that appears to be the case 
>>> (for things llseek and setattr, the vfs should be taking i_mutex).
>>
>> both ceph_aio_read() and ceph_aio_write() call __ceph_do_pending_vmtruncate()
>> without holding the i_mutex
> 
> Hrm.. that's now how I remember it working.  I'm pretty sure i_mutex was 
> providing the serialization around truncation.
> 
>>> IIRC, this is to serialize the page cache truncation with truncate 
>>> operations; this work can only be sanely done under i_mutex, so we defer 
>>> it to the work queue or next person who takes i_mutex and cares about the 
>>> mapping and i_size being consistent.
>>>
>>> What was the deadlock you observed?
>>
>> generic_file_aio_write() locks the i_mutex, then indirectly calls ceph_get_caps()
>> through ceph_write_begin(). ceph_get_caps() wait for pending vmtruncate to complete,
>> but the work queue is blocked by the i_mutex.
> 
> ...but it seems clear that that's not a workable solution.  I suspect the 
> right solution here is to introduce an inner i_trunc_mutex in the ceph 
> inode and use that to protect truncation events in the page cache.  That 
> will touch a lot of different code paths, unfortunately, but I think 
> something like that is necessary if we need to block with i_mutex held by 
> the VFS.
> 

Maybe ceph_vmtruncate_work() can wait until no one is using write cap. I think it's safe to
truncate page cache in that case.

Regards
Yan, Zheng


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 5/6] ceph: check mds_wanted for imported cap
  2013-01-07  5:11       ` Sage Weil
@ 2013-01-07  5:39         ` Yan, Zheng
  0 siblings, 0 replies; 22+ messages in thread
From: Yan, Zheng @ 2013-01-07  5:39 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel

On 01/07/2013 01:11 PM, Sage Weil wrote:
> On Sun, 6 Jan 2013, Yan, Zheng wrote:
>> how about below patch, also updated git://github.com/ukernel/ceph-client.git wip-ceph
> 
> The branch looks good.  Once Alex sorts out testing-next tomorrow we can 
> pull these in.  If you don't mind adding the Reviewed-by tags, that'd be 
> helpful, otherwise I'll do it tomorrow.
> 
done.

Regards
Yan, Zheng


^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2013-01-07  5:39 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-01-04 11:34 [PATCH 0/6] fixes for cephfs kernel client Yan, Zheng
2013-01-04 11:34 ` [PATCH 1/6] ceph: re-calculate truncate_size for strip object Yan, Zheng
2013-01-06  6:04   ` Sage Weil
2013-01-04 11:34 ` [PATCH 2/6] ceph: move dirty inode to migrating list when clearing auth caps Yan, Zheng
2013-01-06  6:05   ` Sage Weil
2013-01-04 11:34 ` [PATCH 3/6] ceph: allow revoking duplicated caps issued by non-auth MDS Yan, Zheng
2013-01-06  6:09   ` Sage Weil
2013-01-06  8:49     ` Yan, Zheng
2013-01-04 11:34 ` [PATCH 4/6] ceph: allocate cap_release message when receiving cap import Yan, Zheng
2013-01-06  6:12   ` Sage Weil
2013-01-04 11:34 ` [PATCH 5/6] ceph: check mds_wanted for imported cap Yan, Zheng
2013-01-06  6:20   ` Sage Weil
2013-01-06  8:21     ` Yan, Zheng
2013-01-07  5:06       ` Sage Weil
2013-01-06  8:54     ` Yan, Zheng
2013-01-07  5:11       ` Sage Weil
2013-01-07  5:39         ` Yan, Zheng
2013-01-04 11:34 ` [PATCH 6/6] ceph: don't acquire i_mutex ceph_vmtruncate_work Yan, Zheng
2013-01-06  6:00   ` Sage Weil
2013-01-06  7:54     ` Yan, Zheng
2013-01-07  5:05       ` Sage Weil
2013-01-07  5:31         ` Yan, Zheng

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.