All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alex Elder <alex.elder@linaro.org>
To: "Yan, Zheng" <zheng.z.yan@intel.com>
Cc: ceph-devel@vger.kernel.org, sage@inktank.com
Subject: Re: [PATCH 1/3] libceph: call r_unsafe_callback when unsafe reply is received
Date: Tue, 02 Jul 2013 08:07:41 -0500	[thread overview]
Message-ID: <51D2D09D.8070808@linaro.org> (raw)
In-Reply-To: <1372056089-11603-1-git-send-email-zheng.z.yan@intel.com>

On 06/24/2013 01:41 AM, Yan, Zheng wrote:
> From: "Yan, Zheng" <zheng.z.yan@intel.com>

Sorry it took so long, I intended to take a look at this
for you sooner.

I would also like to thank you for this nice clear
description.  It made it very easy to understand
why you were proposing the change, and to focus in
on exactly which parts of the design it's affecting.

> We can't use !req->r_sent to check if OSD request is sent for the
> first time, this is because __cancel_request() zeros req->r_sent
> when OSD map changes. Rather than adding a new variable to struct

You're right.

> ceph_osd_request to indicate if it's sent for the first time, We
> can call the unsafe callback only when unsafe OSD reply is received.
> If OSD's first reply is safe, just skip calling the unsafe callback.

This seems reasonable, but it's different from the way I
thought about what constituted "unsafe."  But I may be
wrong, and the way this is used by the file system might
do something that addresses my concern.

The way I interpreted "unsafe" was simply that it was possible
a write *could* have been made persistent, even if the client
doesn't know about it.  A request could have made it to its
target osd, been written, and the response could be in flight
at the point something (maybe a router?) crashes and the response
gets lost.  During that time window, the stored data may not be
in a state that's consistent with the client's view of it.

So I thought of "unsafe" as meaning that a write is in flight,
and until we get a successful response, the storage might
contain the old data or it might contain the new data; the
client has no way of knowing which.

With that interpretation, a request becomes unsafe the
instant it leaves the client, and becomes safe again
the instant a response arrives.

If my interpretation is correct, this change is wrong.

But I may be wrong, and there may really be no need to
worry about a possible modification of data until after
an acknowledgement response is received.  In that case,
I've looked at your patch and it looks good.

Can you explain why I'm wrong about what is "unsafe?"

					-Alex

> The purpose of unsafe callback is adding unsafe request to a list,
> so that fsync(2) can wait for the safe reply. fsync(2) doesn't need
> to wait for a write(2) that hasn't returned yet. So it's OK to add
> request to the unsafe list when the first OSD reply is received.
> (ceph_sync_write() returns after receiving the first OSD reply)
> 
> Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
> ---
>  net/ceph/osd_client.c | 14 +++++++-------
>  1 file changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
> index 540dd29..dd47889 100644
> --- a/net/ceph/osd_client.c
> +++ b/net/ceph/osd_client.c
> @@ -1337,10 +1337,6 @@ static void __send_request(struct ceph_osd_client *osdc,
>  
>  	ceph_msg_get(req->r_request); /* send consumes a ref */
>  
> -	/* Mark the request unsafe if this is the first timet's being sent. */
> -
> -	if (!req->r_sent && req->r_unsafe_callback)
> -		req->r_unsafe_callback(req, true);
>  	req->r_sent = req->r_osd->o_incarnation;
>  
>  	ceph_con_send(&req->r_osd->o_con, req->r_request);
> @@ -1431,8 +1427,6 @@ static void handle_osds_timeout(struct work_struct *work)
>  
>  static void complete_request(struct ceph_osd_request *req)
>  {
> -	if (req->r_unsafe_callback)
> -		req->r_unsafe_callback(req, false);
>  	complete_all(&req->r_safe_completion);  /* fsync waiter */
>  }
>  
> @@ -1559,14 +1553,20 @@ static void handle_reply(struct ceph_osd_client *osdc, struct ceph_msg *msg,
>  	mutex_unlock(&osdc->request_mutex);
>  
>  	if (!already_completed) {
> +		if (req->r_unsafe_callback &&
> +		    result >= 0 && !(flags & CEPH_OSD_FLAG_ONDISK))
> +			req->r_unsafe_callback(req, true);
>  		if (req->r_callback)
>  			req->r_callback(req, msg);
>  		else
>  			complete_all(&req->r_completion);
>  	}
>  
> -	if (flags & CEPH_OSD_FLAG_ONDISK)
> +	if (flags & CEPH_OSD_FLAG_ONDISK) {
> +		if (req->r_unsafe_callback && already_completed)
> +			req->r_unsafe_callback(req, false);
>  		complete_request(req);
> +	}
>  
>  done:
>  	dout("req=%p req->r_linger=%d\n", req, req->r_linger);
> 


  parent reply	other threads:[~2013-07-02 13:07 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-24  6:41 [PATCH 1/3] libceph: call r_unsafe_callback when unsafe reply is received Yan, Zheng
2013-06-24  6:41 ` [PATCH 2/3] mds: fix cap revoke race Yan, Zheng
2013-06-24  8:00   ` Yan, Zheng
2013-06-24  8:19   ` Yan, Zheng
2013-06-24  6:41 ` [PATCH 3/3] mds: fix race between cap issue and revoke Yan, Zheng
2013-06-24  8:16   ` Yan, Zheng
2013-07-01  7:28 ` [PATCH 1/3] libceph: call r_unsafe_callback when unsafe reply is received Yan, Zheng
2013-07-01 19:46   ` Sage Weil
2013-07-03 21:57     ` Sage Weil
2013-07-03 22:07       ` Milosz Tanski
2013-07-03 22:10         ` Sage Weil
2013-07-03 22:43         ` Yan, Zheng
2013-07-08 14:42           ` Milosz Tanski
2013-07-08 19:58             ` Milosz Tanski
2013-07-08 20:30               ` Yan, Zheng
2013-07-08 21:16                 ` Milosz Tanski
2013-07-25 15:43                   ` Milosz Tanski
2013-07-03 22:18       ` Alex Elder
2013-07-03 22:22       ` Yan, Zheng
2013-07-03 22:26         ` Sage Weil
2013-07-03 22:32           ` Sage Weil
2013-07-02 13:07 ` Alex Elder [this message]
2013-07-02 14:27   ` Yan, Zheng
2013-07-02 18:10   ` Sage Weil
2013-07-02 18:11     ` Alex Elder

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51D2D09D.8070808@linaro.org \
    --to=alex.elder@linaro.org \
    --cc=ceph-devel@vger.kernel.org \
    --cc=sage@inktank.com \
    --cc=zheng.z.yan@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.