* [PATCH] ceph: fix error handling in ceph_sync_write
@ 2022-08-24 20:53 Jeff Layton
2022-08-25 1:22 ` Xiubo Li
2022-08-25 8:32 ` Ilya Dryomov
0 siblings, 2 replies; 8+ messages in thread
From: Jeff Layton @ 2022-08-24 20:53 UTC (permalink / raw)
To: xiubli; +Cc: idryomov, ceph-devel
ceph_sync_write has assumed that a zero result in req->r_result means
success. Testing with a recent cluster however shows the OSD returning
a non-zero length written here. I'm not sure whether and when this
changed, but fix the code to accept either result.
Assume a negative result means error, and anything else is a success. If
we're given a short length, then return a short write.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
fs/ceph/file.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index 86265713a743..c0b2c8968be9 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -1632,11 +1632,19 @@ ceph_sync_write(struct kiocb *iocb, struct iov_iter *from, loff_t pos,
req->r_end_latency, len, ret);
out:
ceph_osdc_put_request(req);
- if (ret != 0) {
+ if (ret < 0) {
ceph_set_error_write(ci);
break;
}
+ /*
+ * FIXME: it's unclear whether all OSD versions return the
+ * length written on a write. For now, assume that a 0 return
+ * means that everything got written.
+ */
+ if (ret && ret < len)
+ len = ret;
+
ceph_clear_error_write(ci);
pos += len;
written += len;
--
2.37.2
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH] ceph: fix error handling in ceph_sync_write
2022-08-24 20:53 [PATCH] ceph: fix error handling in ceph_sync_write Jeff Layton
@ 2022-08-25 1:22 ` Xiubo Li
2022-08-25 8:32 ` Ilya Dryomov
1 sibling, 0 replies; 8+ messages in thread
From: Xiubo Li @ 2022-08-25 1:22 UTC (permalink / raw)
To: Jeff Layton; +Cc: idryomov, ceph-devel
Hi Jeff,
How reproducible of this ? Two weeks ago I have run some tests didn't
hit this.
-- Xiubo
On 8/25/22 4:53 AM, Jeff Layton wrote:
> ceph_sync_write has assumed that a zero result in req->r_result means
> success. Testing with a recent cluster however shows the OSD returning
> a non-zero length written here. I'm not sure whether and when this
> changed, but fix the code to accept either result.
>
> Assume a negative result means error, and anything else is a success. If
> we're given a short length, then return a short write.
>
> Signed-off-by: Jeff Layton <jlayton@kernel.org>
> ---
> fs/ceph/file.c | 10 +++++++++-
> 1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/fs/ceph/file.c b/fs/ceph/file.c
> index 86265713a743..c0b2c8968be9 100644
> --- a/fs/ceph/file.c
> +++ b/fs/ceph/file.c
> @@ -1632,11 +1632,19 @@ ceph_sync_write(struct kiocb *iocb, struct iov_iter *from, loff_t pos,
> req->r_end_latency, len, ret);
> out:
> ceph_osdc_put_request(req);
> - if (ret != 0) {
> + if (ret < 0) {
> ceph_set_error_write(ci);
> break;
> }
>
> + /*
> + * FIXME: it's unclear whether all OSD versions return the
> + * length written on a write. For now, assume that a 0 return
> + * means that everything got written.
> + */
> + if (ret && ret < len)
> + len = ret;
> +
> ceph_clear_error_write(ci);
> pos += len;
> written += len;
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] ceph: fix error handling in ceph_sync_write
2022-08-24 20:53 [PATCH] ceph: fix error handling in ceph_sync_write Jeff Layton
2022-08-25 1:22 ` Xiubo Li
@ 2022-08-25 8:32 ` Ilya Dryomov
2022-08-25 9:41 ` Luís Henriques
2022-08-25 10:56 ` Jeff Layton
1 sibling, 2 replies; 8+ messages in thread
From: Ilya Dryomov @ 2022-08-25 8:32 UTC (permalink / raw)
To: Jeff Layton; +Cc: xiubli, ceph-devel
On Wed, Aug 24, 2022 at 10:53 PM Jeff Layton <jlayton@kernel.org> wrote:
>
> ceph_sync_write has assumed that a zero result in req->r_result means
> success. Testing with a recent cluster however shows the OSD returning
> a non-zero length written here. I'm not sure whether and when this
> changed, but fix the code to accept either result.
>
> Assume a negative result means error, and anything else is a success. If
> we're given a short length, then return a short write.
>
> Signed-off-by: Jeff Layton <jlayton@kernel.org>
> ---
> fs/ceph/file.c | 10 +++++++++-
> 1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/fs/ceph/file.c b/fs/ceph/file.c
> index 86265713a743..c0b2c8968be9 100644
> --- a/fs/ceph/file.c
> +++ b/fs/ceph/file.c
> @@ -1632,11 +1632,19 @@ ceph_sync_write(struct kiocb *iocb, struct iov_iter *from, loff_t pos,
> req->r_end_latency, len, ret);
> out:
> ceph_osdc_put_request(req);
> - if (ret != 0) {
> + if (ret < 0) {
> ceph_set_error_write(ci);
> break;
> }
>
> + /*
> + * FIXME: it's unclear whether all OSD versions return the
> + * length written on a write. For now, assume that a 0 return
> + * means that everything got written.
> + */
> + if (ret && ret < len)
> + len = ret;
> +
> ceph_clear_error_write(ci);
> pos += len;
> written += len;
> --
> 2.37.2
>
Hi Jeff,
AFAIK OSDs aren't allowed to return any kind of length on a write
and there is no such thing as a short write. This definitely needs
deeper investigation.
What is the cluster version you are testing against?
Thanks,
Ilya
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] ceph: fix error handling in ceph_sync_write
2022-08-25 8:32 ` Ilya Dryomov
@ 2022-08-25 9:41 ` Luís Henriques
2022-08-25 13:18 ` Jeff Layton
2022-08-25 10:56 ` Jeff Layton
1 sibling, 1 reply; 8+ messages in thread
From: Luís Henriques @ 2022-08-25 9:41 UTC (permalink / raw)
To: Ilya Dryomov; +Cc: Jeff Layton, xiubli, ceph-devel
On Thu, Aug 25, 2022 at 10:32:56AM +0200, Ilya Dryomov wrote:
> On Wed, Aug 24, 2022 at 10:53 PM Jeff Layton <jlayton@kernel.org> wrote:
> >
> > ceph_sync_write has assumed that a zero result in req->r_result means
> > success. Testing with a recent cluster however shows the OSD returning
> > a non-zero length written here. I'm not sure whether and when this
> > changed, but fix the code to accept either result.
> >
> > Assume a negative result means error, and anything else is a success. If
> > we're given a short length, then return a short write.
> >
> > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > ---
> > fs/ceph/file.c | 10 +++++++++-
> > 1 file changed, 9 insertions(+), 1 deletion(-)
> >
> > diff --git a/fs/ceph/file.c b/fs/ceph/file.c
> > index 86265713a743..c0b2c8968be9 100644
> > --- a/fs/ceph/file.c
> > +++ b/fs/ceph/file.c
> > @@ -1632,11 +1632,19 @@ ceph_sync_write(struct kiocb *iocb, struct iov_iter *from, loff_t pos,
> > req->r_end_latency, len, ret);
> > out:
> > ceph_osdc_put_request(req);
> > - if (ret != 0) {
> > + if (ret < 0) {
> > ceph_set_error_write(ci);
> > break;
> > }
> >
> > + /*
> > + * FIXME: it's unclear whether all OSD versions return the
> > + * length written on a write. For now, assume that a 0 return
> > + * means that everything got written.
> > + */
> > + if (ret && ret < len)
> > + len = ret;
> > +
> > ceph_clear_error_write(ci);
> > pos += len;
> > written += len;
> > --
> > 2.37.2
> >
>
> Hi Jeff,
>
> AFAIK OSDs aren't allowed to return any kind of length on a write
> and there is no such thing as a short write. This definitely needs
> deeper investigation.
>
> What is the cluster version you are testing against?
OK, I'm only seeing 'ret' being set to the write length only when enabling
encryption (i.e. with test_dummy_encryption mount option). So, maybe the
right fix is something like:
diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index 16dcade66923..5119d87d61fb 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -1889,6 +1889,7 @@ ceph_sync_write(struct kiocb *iocb, struct iov_iter *from, loff_t pos,
ceph_release_page_vector(pages, num_pages);
break;
}
+ ret = 0;
}
req = ceph_osdc_new_request(osdc, &ci->i_layout,
Cheers,
--
Luís
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH] ceph: fix error handling in ceph_sync_write
2022-08-25 8:32 ` Ilya Dryomov
2022-08-25 9:41 ` Luís Henriques
@ 2022-08-25 10:56 ` Jeff Layton
2022-08-25 13:16 ` Jeff Layton
1 sibling, 1 reply; 8+ messages in thread
From: Jeff Layton @ 2022-08-25 10:56 UTC (permalink / raw)
To: Ilya Dryomov; +Cc: xiubli, ceph-devel
On Thu, 2022-08-25 at 10:32 +0200, Ilya Dryomov wrote:
> On Wed, Aug 24, 2022 at 10:53 PM Jeff Layton <jlayton@kernel.org> wrote:
> >
> > ceph_sync_write has assumed that a zero result in req->r_result means
> > success. Testing with a recent cluster however shows the OSD returning
> > a non-zero length written here. I'm not sure whether and when this
> > changed, but fix the code to accept either result.
> >
> > Assume a negative result means error, and anything else is a success. If
> > we're given a short length, then return a short write.
> >
> > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > ---
> > fs/ceph/file.c | 10 +++++++++-
> > 1 file changed, 9 insertions(+), 1 deletion(-)
> >
> > diff --git a/fs/ceph/file.c b/fs/ceph/file.c
> > index 86265713a743..c0b2c8968be9 100644
> > --- a/fs/ceph/file.c
> > +++ b/fs/ceph/file.c
> > @@ -1632,11 +1632,19 @@ ceph_sync_write(struct kiocb *iocb, struct iov_iter *from, loff_t pos,
> > req->r_end_latency, len, ret);
> > out:
> > ceph_osdc_put_request(req);
> > - if (ret != 0) {
> > + if (ret < 0) {
> > ceph_set_error_write(ci);
> > break;
> > }
> >
> > + /*
> > + * FIXME: it's unclear whether all OSD versions return the
> > + * length written on a write. For now, assume that a 0 return
> > + * means that everything got written.
> > + */
> > + if (ret && ret < len)
> > + len = ret;
> > +
> > ceph_clear_error_write(ci);
> > pos += len;
> > written += len;
> > --
> > 2.37.2
> >
>
> Hi Jeff,
>
> AFAIK OSDs aren't allowed to return any kind of length on a write
> and there is no such thing as a short write. This definitely needs
> deeper investigation.
>
> What is the cluster version you are testing against?
>
That's what I had thought too but I wasn't sure:
[ceph: root@quad1 /]# ceph --version
ceph version 17.0.0-14400-gf61b38dc (f61b38dc82e94f14e7a0a5f6a5888c0c78fafa6c) quincy (dev)
I'll see if I can confirm that this is coming from the OSD and not some
other layer as well.
--
Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] ceph: fix error handling in ceph_sync_write
2022-08-25 10:56 ` Jeff Layton
@ 2022-08-25 13:16 ` Jeff Layton
2022-08-26 0:07 ` Xiubo Li
0 siblings, 1 reply; 8+ messages in thread
From: Jeff Layton @ 2022-08-25 13:16 UTC (permalink / raw)
To: Ilya Dryomov; +Cc: xiubli, ceph-devel
On Thu, 2022-08-25 at 06:56 -0400, Jeff Layton wrote:
> On Thu, 2022-08-25 at 10:32 +0200, Ilya Dryomov wrote:
> > On Wed, Aug 24, 2022 at 10:53 PM Jeff Layton <jlayton@kernel.org> wrote:
> > >
> > > ceph_sync_write has assumed that a zero result in req->r_result means
> > > success. Testing with a recent cluster however shows the OSD returning
> > > a non-zero length written here. I'm not sure whether and when this
> > > changed, but fix the code to accept either result.
> > >
> > > Assume a negative result means error, and anything else is a success. If
> > > we're given a short length, then return a short write.
> > >
> > > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > > ---
> > > fs/ceph/file.c | 10 +++++++++-
> > > 1 file changed, 9 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/fs/ceph/file.c b/fs/ceph/file.c
> > > index 86265713a743..c0b2c8968be9 100644
> > > --- a/fs/ceph/file.c
> > > +++ b/fs/ceph/file.c
> > > @@ -1632,11 +1632,19 @@ ceph_sync_write(struct kiocb *iocb, struct iov_iter *from, loff_t pos,
> > > req->r_end_latency, len, ret);
> > > out:
> > > ceph_osdc_put_request(req);
> > > - if (ret != 0) {
> > > + if (ret < 0) {
> > > ceph_set_error_write(ci);
> > > break;
> > > }
> > >
> > > + /*
> > > + * FIXME: it's unclear whether all OSD versions return the
> > > + * length written on a write. For now, assume that a 0 return
> > > + * means that everything got written.
> > > + */
> > > + if (ret && ret < len)
> > > + len = ret;
> > > +
> > > ceph_clear_error_write(ci);
> > > pos += len;
> > > written += len;
> > > --
> > > 2.37.2
> > >
> >
> > Hi Jeff,
> >
> > AFAIK OSDs aren't allowed to return any kind of length on a write
> > and there is no such thing as a short write. This definitely needs
> > deeper investigation.
> >
> > What is the cluster version you are testing against?
> >
>
> That's what I had thought too but I wasn't sure:
>
> [ceph: root@quad1 /]# ceph --version
> ceph version 17.0.0-14400-gf61b38dc (f61b38dc82e94f14e7a0a5f6a5888c0c78fafa6c) quincy (dev)
>
> I'll see if I can confirm that this is coming from the OSD and not some
> other layer as well.
My mistake. This bug turns out to be a different bug in the fscrypt
stack. We can drop this patch (and I probably should have sent it as an
RFC in the first place). Sorry for the noise!
--
Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] ceph: fix error handling in ceph_sync_write
2022-08-25 9:41 ` Luís Henriques
@ 2022-08-25 13:18 ` Jeff Layton
0 siblings, 0 replies; 8+ messages in thread
From: Jeff Layton @ 2022-08-25 13:18 UTC (permalink / raw)
To: Luís Henriques, Ilya Dryomov; +Cc: xiubli, ceph-devel
On Thu, 2022-08-25 at 10:41 +0100, Luís Henriques wrote:
> On Thu, Aug 25, 2022 at 10:32:56AM +0200, Ilya Dryomov wrote:
> > On Wed, Aug 24, 2022 at 10:53 PM Jeff Layton <jlayton@kernel.org> wrote:
> > >
> > > ceph_sync_write has assumed that a zero result in req->r_result means
> > > success. Testing with a recent cluster however shows the OSD returning
> > > a non-zero length written here. I'm not sure whether and when this
> > > changed, but fix the code to accept either result.
> > >
> > > Assume a negative result means error, and anything else is a success. If
> > > we're given a short length, then return a short write.
> > >
> > > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > > ---
> > > fs/ceph/file.c | 10 +++++++++-
> > > 1 file changed, 9 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/fs/ceph/file.c b/fs/ceph/file.c
> > > index 86265713a743..c0b2c8968be9 100644
> > > --- a/fs/ceph/file.c
> > > +++ b/fs/ceph/file.c
> > > @@ -1632,11 +1632,19 @@ ceph_sync_write(struct kiocb *iocb, struct iov_iter *from, loff_t pos,
> > > req->r_end_latency, len, ret);
> > > out:
> > > ceph_osdc_put_request(req);
> > > - if (ret != 0) {
> > > + if (ret < 0) {
> > > ceph_set_error_write(ci);
> > > break;
> > > }
> > >
> > > + /*
> > > + * FIXME: it's unclear whether all OSD versions return the
> > > + * length written on a write. For now, assume that a 0 return
> > > + * means that everything got written.
> > > + */
> > > + if (ret && ret < len)
> > > + len = ret;
> > > +
> > > ceph_clear_error_write(ci);
> > > pos += len;
> > > written += len;
> > > --
> > > 2.37.2
> > >
> >
> > Hi Jeff,
> >
> > AFAIK OSDs aren't allowed to return any kind of length on a write
> > and there is no such thing as a short write. This definitely needs
> > deeper investigation.
> >
> > What is the cluster version you are testing against?
>
> OK, I'm only seeing 'ret' being set to the write length only when enabling
> encryption (i.e. with test_dummy_encryption mount option). So, maybe the
> right fix is something like:
>
> diff --git a/fs/ceph/file.c b/fs/ceph/file.c
> index 16dcade66923..5119d87d61fb 100644
> --- a/fs/ceph/file.c
> +++ b/fs/ceph/file.c
> @@ -1889,6 +1889,7 @@ ceph_sync_write(struct kiocb *iocb, struct iov_iter *from, loff_t pos,
> ceph_release_page_vector(pages, num_pages);
> break;
> }
> + ret = 0;
> }
>
> req = ceph_osdc_new_request(osdc, &ci->i_layout,
>
No, actually. I think the original patch I sent is just wrong. There was
another bug (another missing ceph_osdc_wait_request) in the fscrypt
stack that was causing r_result to not appear to be zero.
I've fixed this in the ceph-fscrypt branch in my tree and it seems to be
working. I'll plan do a re-send of the fscrypt patches that are not yet
in the testing branch today.
Thanks!
--
Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] ceph: fix error handling in ceph_sync_write
2022-08-25 13:16 ` Jeff Layton
@ 2022-08-26 0:07 ` Xiubo Li
0 siblings, 0 replies; 8+ messages in thread
From: Xiubo Li @ 2022-08-26 0:07 UTC (permalink / raw)
To: Jeff Layton, Ilya Dryomov; +Cc: ceph-devel
On 8/25/22 9:16 PM, Jeff Layton wrote:
> On Thu, 2022-08-25 at 06:56 -0400, Jeff Layton wrote:
>> On Thu, 2022-08-25 at 10:32 +0200, Ilya Dryomov wrote:
>>> On Wed, Aug 24, 2022 at 10:53 PM Jeff Layton <jlayton@kernel.org> wrote:
>>>> ceph_sync_write has assumed that a zero result in req->r_result means
>>>> success. Testing with a recent cluster however shows the OSD returning
>>>> a non-zero length written here. I'm not sure whether and when this
>>>> changed, but fix the code to accept either result.
>>>>
>>>> Assume a negative result means error, and anything else is a success. If
>>>> we're given a short length, then return a short write.
>>>>
>>>> Signed-off-by: Jeff Layton <jlayton@kernel.org>
>>>> ---
>>>> fs/ceph/file.c | 10 +++++++++-
>>>> 1 file changed, 9 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/fs/ceph/file.c b/fs/ceph/file.c
>>>> index 86265713a743..c0b2c8968be9 100644
>>>> --- a/fs/ceph/file.c
>>>> +++ b/fs/ceph/file.c
>>>> @@ -1632,11 +1632,19 @@ ceph_sync_write(struct kiocb *iocb, struct iov_iter *from, loff_t pos,
>>>> req->r_end_latency, len, ret);
>>>> out:
>>>> ceph_osdc_put_request(req);
>>>> - if (ret != 0) {
>>>> + if (ret < 0) {
>>>> ceph_set_error_write(ci);
>>>> break;
>>>> }
>>>>
>>>> + /*
>>>> + * FIXME: it's unclear whether all OSD versions return the
>>>> + * length written on a write. For now, assume that a 0 return
>>>> + * means that everything got written.
>>>> + */
>>>> + if (ret && ret < len)
>>>> + len = ret;
>>>> +
>>>> ceph_clear_error_write(ci);
>>>> pos += len;
>>>> written += len;
>>>> --
>>>> 2.37.2
>>>>
>>> Hi Jeff,
>>>
>>> AFAIK OSDs aren't allowed to return any kind of length on a write
>>> and there is no such thing as a short write. This definitely needs
>>> deeper investigation.
>>>
>>> What is the cluster version you are testing against?
>>>
>> That's what I had thought too but I wasn't sure:
>>
>> [ceph: root@quad1 /]# ceph --version
>> ceph version 17.0.0-14400-gf61b38dc (f61b38dc82e94f14e7a0a5f6a5888c0c78fafa6c) quincy (dev)
>>
>> I'll see if I can confirm that this is coming from the OSD and not some
>> other layer as well.
> My mistake. This bug turns out to be a different bug in the fscrypt
> stack. We can drop this patch (and I probably should have sent it as an
> RFC in the first place). Sorry for the noise!
>
Cool, thanks Jeff.
I saw you new update about this, they look good to me and will test them.
- Xiubo
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2022-08-26 0:07 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-08-24 20:53 [PATCH] ceph: fix error handling in ceph_sync_write Jeff Layton
2022-08-25 1:22 ` Xiubo Li
2022-08-25 8:32 ` Ilya Dryomov
2022-08-25 9:41 ` Luís Henriques
2022-08-25 13:18 ` Jeff Layton
2022-08-25 10:56 ` Jeff Layton
2022-08-25 13:16 ` Jeff Layton
2022-08-26 0:07 ` Xiubo Li
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.