* [PATCH] libceph: fix "safe" completion
@ 2013-04-30 13:17 Yan, Zheng
2013-04-30 17:04 ` Gregory Farnum
0 siblings, 1 reply; 3+ messages in thread
From: Yan, Zheng @ 2013-04-30 13:17 UTC (permalink / raw)
To: ceph-devel; +Cc: elder, Yan, Zheng
From: "Yan, Zheng" <zheng.z.yan@intel.com>
complete_request() is never called if the first OSD replay doesn't
have ONDISK flag.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
---
include/linux/ceph/osd_client.h | 1 -
net/ceph/osd_client.c | 16 ++++++++--------
2 files changed, 8 insertions(+), 9 deletions(-)
diff --git a/include/linux/ceph/osd_client.h b/include/linux/ceph/osd_client.h
index 4191cd2..70f79fb 100644
--- a/include/linux/ceph/osd_client.h
+++ b/include/linux/ceph/osd_client.h
@@ -145,7 +145,6 @@ struct ceph_osd_request {
s32 r_reply_op_result[CEPH_OSD_MAX_OP];
int r_got_reply;
int r_linger;
- int r_completed;
struct ceph_osd_client *r_osdc;
struct kref r_kref;
diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
index 57d8db5..8a5fc37 100644
--- a/net/ceph/osd_client.c
+++ b/net/ceph/osd_client.c
@@ -1522,6 +1522,8 @@ static void handle_reply(struct ceph_osd_client *osdc, struct ceph_msg *msg,
for (i = 0; i < numops; i++)
req->r_reply_op_result[i] = ceph_decode_32(&p);
+ already_completed = req->r_got_reply;
+
if (!req->r_got_reply) {
req->r_result = result;
@@ -1552,16 +1554,14 @@ static void handle_reply(struct ceph_osd_client *osdc, struct ceph_msg *msg,
((flags & CEPH_OSD_FLAG_WRITE) == 0))
__unregister_request(osdc, req);
- already_completed = req->r_completed;
- req->r_completed = 1;
mutex_unlock(&osdc->request_mutex);
- if (already_completed)
- goto done;
- if (req->r_callback)
- req->r_callback(req, msg);
- else
- complete_all(&req->r_completion);
+ if (!already_completed) {
+ if (req->r_callback)
+ req->r_callback(req, msg);
+ else
+ complete_all(&req->r_completion);
+ }
if (flags & CEPH_OSD_FLAG_ONDISK)
complete_request(req);
--
1.8.1.4
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH] libceph: fix "safe" completion
2013-04-30 13:17 [PATCH] libceph: fix "safe" completion Yan, Zheng
@ 2013-04-30 17:04 ` Gregory Farnum
2013-05-01 10:13 ` Yan, Zheng
0 siblings, 1 reply; 3+ messages in thread
From: Gregory Farnum @ 2013-04-30 17:04 UTC (permalink / raw)
To: Yan, Zheng; +Cc: ceph-devel@vger.kernel.org, Alex Elder
I was looking at this code recently and had similar thoughts; I've
included the relevant parts of a conversation on this topic below. In
light of that I'm not sure if this patch is appropriate or not (it
might be; it certainly needs some cleanup IMO).
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
---------- Forwarded message ----------
From: Sage Weil <sage@inktank.com>
Date: Thu, Apr 18, 2013 at 1:41 PM
Subject: Re: use of osd_client's r_callback
To: Gregory Farnum <greg@inktank.com>
Cc: Alex Elder <elder@inktank.com>, Josh Durgin <josh.durgin@inktank.com>
It is a bit fragile, but not broken AFAICS. We only se the FLAG_ACK flag
on the request (which controls whether we get an ACK in addition to a
COMMIT reply from the osd) for the sync write path (O_DIRECT and O_SYNC
writes).. and in that case, the callback is expecting to be called on the
ack and the commit is just clenaing up that safe/unsafe list for fsync.
I think we could clean this code up so that the names and meanings of the
callbacks are more consistent (e.g., r_commit_callback vs
r_ack_callback)...
s
On Thu, 18 Apr 2013, Gregory Farnum wrote:
> The recent series of patches to deal with deadlocks has me spending a
> little bit more time looking at the kernel client, especially after I
> didn't understand some of the comments going back and forth about safe
> and unsafe lists and callbacks (specifically, compared to their
> meanings in userspace). This investigation has left me concerned, and
> maybe confused, about when we're reporting writes as done.
>
> In particular, following on from change to "unsafe" so it no longer
> indicates any communication from the OSD (which might be fine?), I
> spent some time looking at how the other callback is handled. As best
> I can tell, the osdc's "r_callback" is used, from the caller's
> perspective (drivers/block/rbd.c and fs/ceph/*), to indicate that a
> write is permanent and it can forget the data/unlock the pages/etc ?
> this matches what the userspace clients do with a "safe" response.
> However, in looking at the handle_reply() function, it sure looks to
> me as if r_callback() is invoked on the first reply from an OSD ?
> which can in many cases be an "unsafe" reply meaning that the data
> hasn't been persisted yet! The only reference I can find to it seeing
> if the reply is a safe or unsafe reply happens during duplicate
> response checking, on the second (or subsequent) responses.
>
> Am I missing something here, or is the kernel client broken under OSD
> failures? (I note that these issues aren't going to turn up when the
> OSDs are running xfs/ext4 because on those FSes the first reply will
> always be the "safe" response, which explains why we haven't detected
> them if there are no other reasons.)
> -Greg
>
>
On Tue, Apr 30, 2013 at 6:17 AM, Yan, Zheng <zheng.z.yan@intel.com> wrote:
> From: "Yan, Zheng" <zheng.z.yan@intel.com>
>
> complete_request() is never called if the first OSD replay doesn't
> have ONDISK flag.
>
> Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
> ---
> include/linux/ceph/osd_client.h | 1 -
> net/ceph/osd_client.c | 16 ++++++++--------
> 2 files changed, 8 insertions(+), 9 deletions(-)
>
> diff --git a/include/linux/ceph/osd_client.h b/include/linux/ceph/osd_client.h
> index 4191cd2..70f79fb 100644
> --- a/include/linux/ceph/osd_client.h
> +++ b/include/linux/ceph/osd_client.h
> @@ -145,7 +145,6 @@ struct ceph_osd_request {
> s32 r_reply_op_result[CEPH_OSD_MAX_OP];
> int r_got_reply;
> int r_linger;
> - int r_completed;
>
> struct ceph_osd_client *r_osdc;
> struct kref r_kref;
> diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
> index 57d8db5..8a5fc37 100644
> --- a/net/ceph/osd_client.c
> +++ b/net/ceph/osd_client.c
> @@ -1522,6 +1522,8 @@ static void handle_reply(struct ceph_osd_client *osdc, struct ceph_msg *msg,
> for (i = 0; i < numops; i++)
> req->r_reply_op_result[i] = ceph_decode_32(&p);
>
> + already_completed = req->r_got_reply;
> +
> if (!req->r_got_reply) {
>
> req->r_result = result;
> @@ -1552,16 +1554,14 @@ static void handle_reply(struct ceph_osd_client *osdc, struct ceph_msg *msg,
> ((flags & CEPH_OSD_FLAG_WRITE) == 0))
> __unregister_request(osdc, req);
>
> - already_completed = req->r_completed;
> - req->r_completed = 1;
> mutex_unlock(&osdc->request_mutex);
> - if (already_completed)
> - goto done;
>
> - if (req->r_callback)
> - req->r_callback(req, msg);
> - else
> - complete_all(&req->r_completion);
> + if (!already_completed) {
> + if (req->r_callback)
> + req->r_callback(req, msg);
> + else
> + complete_all(&req->r_completion);
> + }
>
> if (flags & CEPH_OSD_FLAG_ONDISK)
> complete_request(req);
> --
> 1.8.1.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] libceph: fix "safe" completion
2013-04-30 17:04 ` Gregory Farnum
@ 2013-05-01 10:13 ` Yan, Zheng
0 siblings, 0 replies; 3+ messages in thread
From: Yan, Zheng @ 2013-05-01 10:13 UTC (permalink / raw)
To: Gregory Farnum; +Cc: ceph-devel@vger.kernel.org, Alex Elder
The kernel client set the FLAG_ACK flag only when it can't get Fb or Fl caps
(O_DIRECT and O_SYNC writes do not set the FLAG_ACK flag). ceph_sync_write()
allocates separate buffer to hold user date, it's safe even in the case of
OSD failure. I agree that the names and usages of callbacks are confusing.
Other than that, I can't see any issue.
Regards
Yan, Zheng
On 05/01/2013 01:04 AM, Gregory Farnum wrote:
> I was looking at this code recently and had similar thoughts; I've
> included the relevant parts of a conversation on this topic below. In
> light of that I'm not sure if this patch is appropriate or not (it
> might be; it certainly needs some cleanup IMO).
> -Greg
> Software Engineer #42 @ http://inktank.com | http://ceph.com
>
>
> ---------- Forwarded message ----------
> From: Sage Weil <sage@inktank.com>
> Date: Thu, Apr 18, 2013 at 1:41 PM
> Subject: Re: use of osd_client's r_callback
> To: Gregory Farnum <greg@inktank.com>
> Cc: Alex Elder <elder@inktank.com>, Josh Durgin <josh.durgin@inktank.com>
>
>
> It is a bit fragile, but not broken AFAICS. We only se the FLAG_ACK flag
> on the request (which controls whether we get an ACK in addition to a
> COMMIT reply from the osd) for the sync write path (O_DIRECT and O_SYNC
> writes).. and in that case, the callback is expecting to be called on the
> ack and the commit is just clenaing up that safe/unsafe list for fsync.
>
> I think we could clean this code up so that the names and meanings of the
> callbacks are more consistent (e.g., r_commit_callback vs
> r_ack_callback)...
>
> s
>
> On Thu, 18 Apr 2013, Gregory Farnum wrote:
>
>> The recent series of patches to deal with deadlocks has me spending a
>> little bit more time looking at the kernel client, especially after I
>> didn't understand some of the comments going back and forth about safe
>> and unsafe lists and callbacks (specifically, compared to their
>> meanings in userspace). This investigation has left me concerned, and
>> maybe confused, about when we're reporting writes as done.
>>
>> In particular, following on from change to "unsafe" so it no longer
>> indicates any communication from the OSD (which might be fine?), I
>> spent some time looking at how the other callback is handled. As best
>> I can tell, the osdc's "r_callback" is used, from the caller's
>> perspective (drivers/block/rbd.c and fs/ceph/*), to indicate that a
>> write is permanent and it can forget the data/unlock the pages/etc ?
>> this matches what the userspace clients do with a "safe" response.
>> However, in looking at the handle_reply() function, it sure looks to
>> me as if r_callback() is invoked on the first reply from an OSD ?
>> which can in many cases be an "unsafe" reply meaning that the data
>> hasn't been persisted yet! The only reference I can find to it seeing
>> if the reply is a safe or unsafe reply happens during duplicate
>> response checking, on the second (or subsequent) responses.
>>
>> Am I missing something here, or is the kernel client broken under OSD
>> failures? (I note that these issues aren't going to turn up when the
>> OSDs are running xfs/ext4 because on those FSes the first reply will
>> always be the "safe" response, which explains why we haven't detected
>> them if there are no other reasons.)
>> -Greg
>>
>>
>
>
> On Tue, Apr 30, 2013 at 6:17 AM, Yan, Zheng <zheng.z.yan@intel.com> wrote:
>> From: "Yan, Zheng" <zheng.z.yan@intel.com>
>>
>> complete_request() is never called if the first OSD replay doesn't
>> have ONDISK flag.
>>
>> Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
>> ---
>> include/linux/ceph/osd_client.h | 1 -
>> net/ceph/osd_client.c | 16 ++++++++--------
>> 2 files changed, 8 insertions(+), 9 deletions(-)
>>
>> diff --git a/include/linux/ceph/osd_client.h b/include/linux/ceph/osd_client.h
>> index 4191cd2..70f79fb 100644
>> --- a/include/linux/ceph/osd_client.h
>> +++ b/include/linux/ceph/osd_client.h
>> @@ -145,7 +145,6 @@ struct ceph_osd_request {
>> s32 r_reply_op_result[CEPH_OSD_MAX_OP];
>> int r_got_reply;
>> int r_linger;
>> - int r_completed;
>>
>> struct ceph_osd_client *r_osdc;
>> struct kref r_kref;
>> diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
>> index 57d8db5..8a5fc37 100644
>> --- a/net/ceph/osd_client.c
>> +++ b/net/ceph/osd_client.c
>> @@ -1522,6 +1522,8 @@ static void handle_reply(struct ceph_osd_client *osdc, struct ceph_msg *msg,
>> for (i = 0; i < numops; i++)
>> req->r_reply_op_result[i] = ceph_decode_32(&p);
>>
>> + already_completed = req->r_got_reply;
>> +
>> if (!req->r_got_reply) {
>>
>> req->r_result = result;
>> @@ -1552,16 +1554,14 @@ static void handle_reply(struct ceph_osd_client *osdc, struct ceph_msg *msg,
>> ((flags & CEPH_OSD_FLAG_WRITE) == 0))
>> __unregister_request(osdc, req);
>>
>> - already_completed = req->r_completed;
>> - req->r_completed = 1;
>> mutex_unlock(&osdc->request_mutex);
>> - if (already_completed)
>> - goto done;
>>
>> - if (req->r_callback)
>> - req->r_callback(req, msg);
>> - else
>> - complete_all(&req->r_completion);
>> + if (!already_completed) {
>> + if (req->r_callback)
>> + req->r_callback(req, msg);
>> + else
>> + complete_all(&req->r_completion);
>> + }
>>
>> if (flags & CEPH_OSD_FLAG_ONDISK)
>> complete_request(req);
>> --
>> 1.8.1.4
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2013-05-01 10:13 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-04-30 13:17 [PATCH] libceph: fix "safe" completion Yan, Zheng
2013-04-30 17:04 ` Gregory Farnum
2013-05-01 10:13 ` Yan, Zheng
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.