All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] ceph: fix error handling in ceph_sync_write
@ 2022-08-24 20:53 Jeff Layton
  2022-08-25  1:22 ` Xiubo Li
  2022-08-25  8:32 ` Ilya Dryomov
  0 siblings, 2 replies; 8+ messages in thread
From: Jeff Layton @ 2022-08-24 20:53 UTC (permalink / raw)
  To: xiubli; +Cc: idryomov, ceph-devel

ceph_sync_write has assumed that a zero result in req->r_result means
success. Testing with a recent cluster however shows the OSD returning
a non-zero length written here. I'm not sure whether and when this
changed, but fix the code to accept either result.

Assume a negative result means error, and anything else is a success. If
we're given a short length, then return a short write.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/file.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index 86265713a743..c0b2c8968be9 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -1632,11 +1632,19 @@ ceph_sync_write(struct kiocb *iocb, struct iov_iter *from, loff_t pos,
 					  req->r_end_latency, len, ret);
 out:
 		ceph_osdc_put_request(req);
-		if (ret != 0) {
+		if (ret < 0) {
 			ceph_set_error_write(ci);
 			break;
 		}
 
+		/*
+		 * FIXME: it's unclear whether all OSD versions return the
+		 * length written on a write. For now, assume that a 0 return
+		 * means that everything got written.
+		 */
+		if (ret && ret < len)
+			len = ret;
+
 		ceph_clear_error_write(ci);
 		pos += len;
 		written += len;
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] ceph: fix error handling in ceph_sync_write
  2022-08-24 20:53 [PATCH] ceph: fix error handling in ceph_sync_write Jeff Layton
@ 2022-08-25  1:22 ` Xiubo Li
  2022-08-25  8:32 ` Ilya Dryomov
  1 sibling, 0 replies; 8+ messages in thread
From: Xiubo Li @ 2022-08-25  1:22 UTC (permalink / raw)
  To: Jeff Layton; +Cc: idryomov, ceph-devel

Hi Jeff,

How reproducible of this ? Two weeks ago I have run some tests didn't 
hit this.

-- Xiubo

On 8/25/22 4:53 AM, Jeff Layton wrote:
> ceph_sync_write has assumed that a zero result in req->r_result means
> success. Testing with a recent cluster however shows the OSD returning
> a non-zero length written here. I'm not sure whether and when this
> changed, but fix the code to accept either result.
>
> Assume a negative result means error, and anything else is a success. If
> we're given a short length, then return a short write.
>
> Signed-off-by: Jeff Layton <jlayton@kernel.org>
> ---
>   fs/ceph/file.c | 10 +++++++++-
>   1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/fs/ceph/file.c b/fs/ceph/file.c
> index 86265713a743..c0b2c8968be9 100644
> --- a/fs/ceph/file.c
> +++ b/fs/ceph/file.c
> @@ -1632,11 +1632,19 @@ ceph_sync_write(struct kiocb *iocb, struct iov_iter *from, loff_t pos,
>   					  req->r_end_latency, len, ret);
>   out:
>   		ceph_osdc_put_request(req);
> -		if (ret != 0) {
> +		if (ret < 0) {
>   			ceph_set_error_write(ci);
>   			break;
>   		}
>   
> +		/*
> +		 * FIXME: it's unclear whether all OSD versions return the
> +		 * length written on a write. For now, assume that a 0 return
> +		 * means that everything got written.
> +		 */
> +		if (ret && ret < len)
> +			len = ret;
> +
>   		ceph_clear_error_write(ci);
>   		pos += len;
>   		written += len;


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] ceph: fix error handling in ceph_sync_write
  2022-08-24 20:53 [PATCH] ceph: fix error handling in ceph_sync_write Jeff Layton
  2022-08-25  1:22 ` Xiubo Li
@ 2022-08-25  8:32 ` Ilya Dryomov
  2022-08-25  9:41   ` Luís Henriques
  2022-08-25 10:56   ` Jeff Layton
  1 sibling, 2 replies; 8+ messages in thread
From: Ilya Dryomov @ 2022-08-25  8:32 UTC (permalink / raw)
  To: Jeff Layton; +Cc: xiubli, ceph-devel

On Wed, Aug 24, 2022 at 10:53 PM Jeff Layton <jlayton@kernel.org> wrote:
>
> ceph_sync_write has assumed that a zero result in req->r_result means
> success. Testing with a recent cluster however shows the OSD returning
> a non-zero length written here. I'm not sure whether and when this
> changed, but fix the code to accept either result.
>
> Assume a negative result means error, and anything else is a success. If
> we're given a short length, then return a short write.
>
> Signed-off-by: Jeff Layton <jlayton@kernel.org>
> ---
>  fs/ceph/file.c | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/fs/ceph/file.c b/fs/ceph/file.c
> index 86265713a743..c0b2c8968be9 100644
> --- a/fs/ceph/file.c
> +++ b/fs/ceph/file.c
> @@ -1632,11 +1632,19 @@ ceph_sync_write(struct kiocb *iocb, struct iov_iter *from, loff_t pos,
>                                           req->r_end_latency, len, ret);
>  out:
>                 ceph_osdc_put_request(req);
> -               if (ret != 0) {
> +               if (ret < 0) {
>                         ceph_set_error_write(ci);
>                         break;
>                 }
>
> +               /*
> +                * FIXME: it's unclear whether all OSD versions return the
> +                * length written on a write. For now, assume that a 0 return
> +                * means that everything got written.
> +                */
> +               if (ret && ret < len)
> +                       len = ret;
> +
>                 ceph_clear_error_write(ci);
>                 pos += len;
>                 written += len;
> --
> 2.37.2
>

Hi Jeff,

AFAIK OSDs aren't allowed to return any kind of length on a write
and there is no such thing as a short write.  This definitely needs
deeper investigation.

What is the cluster version you are testing against?

Thanks,

                Ilya

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] ceph: fix error handling in ceph_sync_write
  2022-08-25  8:32 ` Ilya Dryomov
@ 2022-08-25  9:41   ` Luís Henriques
  2022-08-25 13:18     ` Jeff Layton
  2022-08-25 10:56   ` Jeff Layton
  1 sibling, 1 reply; 8+ messages in thread
From: Luís Henriques @ 2022-08-25  9:41 UTC (permalink / raw)
  To: Ilya Dryomov; +Cc: Jeff Layton, xiubli, ceph-devel

On Thu, Aug 25, 2022 at 10:32:56AM +0200, Ilya Dryomov wrote:
> On Wed, Aug 24, 2022 at 10:53 PM Jeff Layton <jlayton@kernel.org> wrote:
> >
> > ceph_sync_write has assumed that a zero result in req->r_result means
> > success. Testing with a recent cluster however shows the OSD returning
> > a non-zero length written here. I'm not sure whether and when this
> > changed, but fix the code to accept either result.
> >
> > Assume a negative result means error, and anything else is a success. If
> > we're given a short length, then return a short write.
> >
> > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > ---
> >  fs/ceph/file.c | 10 +++++++++-
> >  1 file changed, 9 insertions(+), 1 deletion(-)
> >
> > diff --git a/fs/ceph/file.c b/fs/ceph/file.c
> > index 86265713a743..c0b2c8968be9 100644
> > --- a/fs/ceph/file.c
> > +++ b/fs/ceph/file.c
> > @@ -1632,11 +1632,19 @@ ceph_sync_write(struct kiocb *iocb, struct iov_iter *from, loff_t pos,
> >                                           req->r_end_latency, len, ret);
> >  out:
> >                 ceph_osdc_put_request(req);
> > -               if (ret != 0) {
> > +               if (ret < 0) {
> >                         ceph_set_error_write(ci);
> >                         break;
> >                 }
> >
> > +               /*
> > +                * FIXME: it's unclear whether all OSD versions return the
> > +                * length written on a write. For now, assume that a 0 return
> > +                * means that everything got written.
> > +                */
> > +               if (ret && ret < len)
> > +                       len = ret;
> > +
> >                 ceph_clear_error_write(ci);
> >                 pos += len;
> >                 written += len;
> > --
> > 2.37.2
> >
> 
> Hi Jeff,
> 
> AFAIK OSDs aren't allowed to return any kind of length on a write
> and there is no such thing as a short write.  This definitely needs
> deeper investigation.
> 
> What is the cluster version you are testing against?

OK, I'm only seeing 'ret' being set to the write length only when enabling
encryption (i.e. with test_dummy_encryption mount option).  So, maybe the
right fix is something like:

diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index 16dcade66923..5119d87d61fb 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -1889,6 +1889,7 @@ ceph_sync_write(struct kiocb *iocb, struct iov_iter *from, loff_t pos,
 				ceph_release_page_vector(pages, num_pages);
 				break;
 			}
+			ret = 0;
 		}
 
 		req = ceph_osdc_new_request(osdc, &ci->i_layout,

Cheers,
--
Luís

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] ceph: fix error handling in ceph_sync_write
  2022-08-25  8:32 ` Ilya Dryomov
  2022-08-25  9:41   ` Luís Henriques
@ 2022-08-25 10:56   ` Jeff Layton
  2022-08-25 13:16     ` Jeff Layton
  1 sibling, 1 reply; 8+ messages in thread
From: Jeff Layton @ 2022-08-25 10:56 UTC (permalink / raw)
  To: Ilya Dryomov; +Cc: xiubli, ceph-devel

On Thu, 2022-08-25 at 10:32 +0200, Ilya Dryomov wrote:
> On Wed, Aug 24, 2022 at 10:53 PM Jeff Layton <jlayton@kernel.org> wrote:
> > 
> > ceph_sync_write has assumed that a zero result in req->r_result means
> > success. Testing with a recent cluster however shows the OSD returning
> > a non-zero length written here. I'm not sure whether and when this
> > changed, but fix the code to accept either result.
> > 
> > Assume a negative result means error, and anything else is a success. If
> > we're given a short length, then return a short write.
> > 
> > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > ---
> >  fs/ceph/file.c | 10 +++++++++-
> >  1 file changed, 9 insertions(+), 1 deletion(-)
> > 
> > diff --git a/fs/ceph/file.c b/fs/ceph/file.c
> > index 86265713a743..c0b2c8968be9 100644
> > --- a/fs/ceph/file.c
> > +++ b/fs/ceph/file.c
> > @@ -1632,11 +1632,19 @@ ceph_sync_write(struct kiocb *iocb, struct iov_iter *from, loff_t pos,
> >                                           req->r_end_latency, len, ret);
> >  out:
> >                 ceph_osdc_put_request(req);
> > -               if (ret != 0) {
> > +               if (ret < 0) {
> >                         ceph_set_error_write(ci);
> >                         break;
> >                 }
> > 
> > +               /*
> > +                * FIXME: it's unclear whether all OSD versions return the
> > +                * length written on a write. For now, assume that a 0 return
> > +                * means that everything got written.
> > +                */
> > +               if (ret && ret < len)
> > +                       len = ret;
> > +
> >                 ceph_clear_error_write(ci);
> >                 pos += len;
> >                 written += len;
> > --
> > 2.37.2
> > 
> 
> Hi Jeff,
> 
> AFAIK OSDs aren't allowed to return any kind of length on a write
> and there is no such thing as a short write.  This definitely needs
> deeper investigation.
> 
> What is the cluster version you are testing against?
> 

That's what I had thought too but I wasn't sure:

    [ceph: root@quad1 /]# ceph --version
    ceph version 17.0.0-14400-gf61b38dc (f61b38dc82e94f14e7a0a5f6a5888c0c78fafa6c) quincy (dev)

I'll see if I can confirm that this is coming from the OSD and not some
other layer as well.
-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] ceph: fix error handling in ceph_sync_write
  2022-08-25 10:56   ` Jeff Layton
@ 2022-08-25 13:16     ` Jeff Layton
  2022-08-26  0:07       ` Xiubo Li
  0 siblings, 1 reply; 8+ messages in thread
From: Jeff Layton @ 2022-08-25 13:16 UTC (permalink / raw)
  To: Ilya Dryomov; +Cc: xiubli, ceph-devel

On Thu, 2022-08-25 at 06:56 -0400, Jeff Layton wrote:
> On Thu, 2022-08-25 at 10:32 +0200, Ilya Dryomov wrote:
> > On Wed, Aug 24, 2022 at 10:53 PM Jeff Layton <jlayton@kernel.org> wrote:
> > > 
> > > ceph_sync_write has assumed that a zero result in req->r_result means
> > > success. Testing with a recent cluster however shows the OSD returning
> > > a non-zero length written here. I'm not sure whether and when this
> > > changed, but fix the code to accept either result.
> > > 
> > > Assume a negative result means error, and anything else is a success. If
> > > we're given a short length, then return a short write.
> > > 
> > > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > > ---
> > >  fs/ceph/file.c | 10 +++++++++-
> > >  1 file changed, 9 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/fs/ceph/file.c b/fs/ceph/file.c
> > > index 86265713a743..c0b2c8968be9 100644
> > > --- a/fs/ceph/file.c
> > > +++ b/fs/ceph/file.c
> > > @@ -1632,11 +1632,19 @@ ceph_sync_write(struct kiocb *iocb, struct iov_iter *from, loff_t pos,
> > >                                           req->r_end_latency, len, ret);
> > >  out:
> > >                 ceph_osdc_put_request(req);
> > > -               if (ret != 0) {
> > > +               if (ret < 0) {
> > >                         ceph_set_error_write(ci);
> > >                         break;
> > >                 }
> > > 
> > > +               /*
> > > +                * FIXME: it's unclear whether all OSD versions return the
> > > +                * length written on a write. For now, assume that a 0 return
> > > +                * means that everything got written.
> > > +                */
> > > +               if (ret && ret < len)
> > > +                       len = ret;
> > > +
> > >                 ceph_clear_error_write(ci);
> > >                 pos += len;
> > >                 written += len;
> > > --
> > > 2.37.2
> > > 
> > 
> > Hi Jeff,
> > 
> > AFAIK OSDs aren't allowed to return any kind of length on a write
> > and there is no such thing as a short write.  This definitely needs
> > deeper investigation.
> > 
> > What is the cluster version you are testing against?
> > 
> 
> That's what I had thought too but I wasn't sure:
> 
>     [ceph: root@quad1 /]# ceph --version
>     ceph version 17.0.0-14400-gf61b38dc (f61b38dc82e94f14e7a0a5f6a5888c0c78fafa6c) quincy (dev)
> 
> I'll see if I can confirm that this is coming from the OSD and not some
> other layer as well.

My mistake. This bug turns out to be a different bug in the fscrypt
stack. We can drop this patch (and I probably should have sent it as an
RFC in the first place). Sorry for the noise!

-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] ceph: fix error handling in ceph_sync_write
  2022-08-25  9:41   ` Luís Henriques
@ 2022-08-25 13:18     ` Jeff Layton
  0 siblings, 0 replies; 8+ messages in thread
From: Jeff Layton @ 2022-08-25 13:18 UTC (permalink / raw)
  To: Luís Henriques, Ilya Dryomov; +Cc: xiubli, ceph-devel

On Thu, 2022-08-25 at 10:41 +0100, Luís Henriques wrote:
> On Thu, Aug 25, 2022 at 10:32:56AM +0200, Ilya Dryomov wrote:
> > On Wed, Aug 24, 2022 at 10:53 PM Jeff Layton <jlayton@kernel.org> wrote:
> > > 
> > > ceph_sync_write has assumed that a zero result in req->r_result means
> > > success. Testing with a recent cluster however shows the OSD returning
> > > a non-zero length written here. I'm not sure whether and when this
> > > changed, but fix the code to accept either result.
> > > 
> > > Assume a negative result means error, and anything else is a success. If
> > > we're given a short length, then return a short write.
> > > 
> > > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > > ---
> > >  fs/ceph/file.c | 10 +++++++++-
> > >  1 file changed, 9 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/fs/ceph/file.c b/fs/ceph/file.c
> > > index 86265713a743..c0b2c8968be9 100644
> > > --- a/fs/ceph/file.c
> > > +++ b/fs/ceph/file.c
> > > @@ -1632,11 +1632,19 @@ ceph_sync_write(struct kiocb *iocb, struct iov_iter *from, loff_t pos,
> > >                                           req->r_end_latency, len, ret);
> > >  out:
> > >                 ceph_osdc_put_request(req);
> > > -               if (ret != 0) {
> > > +               if (ret < 0) {
> > >                         ceph_set_error_write(ci);
> > >                         break;
> > >                 }
> > > 
> > > +               /*
> > > +                * FIXME: it's unclear whether all OSD versions return the
> > > +                * length written on a write. For now, assume that a 0 return
> > > +                * means that everything got written.
> > > +                */
> > > +               if (ret && ret < len)
> > > +                       len = ret;
> > > +
> > >                 ceph_clear_error_write(ci);
> > >                 pos += len;
> > >                 written += len;
> > > --
> > > 2.37.2
> > > 
> > 
> > Hi Jeff,
> > 
> > AFAIK OSDs aren't allowed to return any kind of length on a write
> > and there is no such thing as a short write.  This definitely needs
> > deeper investigation.
> > 
> > What is the cluster version you are testing against?
> 
> OK, I'm only seeing 'ret' being set to the write length only when enabling
> encryption (i.e. with test_dummy_encryption mount option).  So, maybe the
> right fix is something like:
> 
> diff --git a/fs/ceph/file.c b/fs/ceph/file.c
> index 16dcade66923..5119d87d61fb 100644
> --- a/fs/ceph/file.c
> +++ b/fs/ceph/file.c
> @@ -1889,6 +1889,7 @@ ceph_sync_write(struct kiocb *iocb, struct iov_iter *from, loff_t pos,
>  				ceph_release_page_vector(pages, num_pages);
>  				break;
>  			}
> +			ret = 0;
>  		}
>  
>  		req = ceph_osdc_new_request(osdc, &ci->i_layout,
> 

No, actually. I think the original patch I sent is just wrong. There was
another bug (another missing ceph_osdc_wait_request) in the fscrypt
stack that was causing r_result to not appear to be zero.

I've fixed this in the ceph-fscrypt branch in my tree and it seems to be
working. I'll plan do a re-send of the fscrypt patches that are not yet
in the testing branch today.

Thanks!
-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] ceph: fix error handling in ceph_sync_write
  2022-08-25 13:16     ` Jeff Layton
@ 2022-08-26  0:07       ` Xiubo Li
  0 siblings, 0 replies; 8+ messages in thread
From: Xiubo Li @ 2022-08-26  0:07 UTC (permalink / raw)
  To: Jeff Layton, Ilya Dryomov; +Cc: ceph-devel


On 8/25/22 9:16 PM, Jeff Layton wrote:
> On Thu, 2022-08-25 at 06:56 -0400, Jeff Layton wrote:
>> On Thu, 2022-08-25 at 10:32 +0200, Ilya Dryomov wrote:
>>> On Wed, Aug 24, 2022 at 10:53 PM Jeff Layton <jlayton@kernel.org> wrote:
>>>> ceph_sync_write has assumed that a zero result in req->r_result means
>>>> success. Testing with a recent cluster however shows the OSD returning
>>>> a non-zero length written here. I'm not sure whether and when this
>>>> changed, but fix the code to accept either result.
>>>>
>>>> Assume a negative result means error, and anything else is a success. If
>>>> we're given a short length, then return a short write.
>>>>
>>>> Signed-off-by: Jeff Layton <jlayton@kernel.org>
>>>> ---
>>>>   fs/ceph/file.c | 10 +++++++++-
>>>>   1 file changed, 9 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/fs/ceph/file.c b/fs/ceph/file.c
>>>> index 86265713a743..c0b2c8968be9 100644
>>>> --- a/fs/ceph/file.c
>>>> +++ b/fs/ceph/file.c
>>>> @@ -1632,11 +1632,19 @@ ceph_sync_write(struct kiocb *iocb, struct iov_iter *from, loff_t pos,
>>>>                                            req->r_end_latency, len, ret);
>>>>   out:
>>>>                  ceph_osdc_put_request(req);
>>>> -               if (ret != 0) {
>>>> +               if (ret < 0) {
>>>>                          ceph_set_error_write(ci);
>>>>                          break;
>>>>                  }
>>>>
>>>> +               /*
>>>> +                * FIXME: it's unclear whether all OSD versions return the
>>>> +                * length written on a write. For now, assume that a 0 return
>>>> +                * means that everything got written.
>>>> +                */
>>>> +               if (ret && ret < len)
>>>> +                       len = ret;
>>>> +
>>>>                  ceph_clear_error_write(ci);
>>>>                  pos += len;
>>>>                  written += len;
>>>> --
>>>> 2.37.2
>>>>
>>> Hi Jeff,
>>>
>>> AFAIK OSDs aren't allowed to return any kind of length on a write
>>> and there is no such thing as a short write.  This definitely needs
>>> deeper investigation.
>>>
>>> What is the cluster version you are testing against?
>>>
>> That's what I had thought too but I wasn't sure:
>>
>>      [ceph: root@quad1 /]# ceph --version
>>      ceph version 17.0.0-14400-gf61b38dc (f61b38dc82e94f14e7a0a5f6a5888c0c78fafa6c) quincy (dev)
>>
>> I'll see if I can confirm that this is coming from the OSD and not some
>> other layer as well.
> My mistake. This bug turns out to be a different bug in the fscrypt
> stack. We can drop this patch (and I probably should have sent it as an
> RFC in the first place). Sorry for the noise!
>
Cool, thanks Jeff.

I saw you new update about this, they look good to me and will test them.

- Xiubo


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-08-26  0:07 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-08-24 20:53 [PATCH] ceph: fix error handling in ceph_sync_write Jeff Layton
2022-08-25  1:22 ` Xiubo Li
2022-08-25  8:32 ` Ilya Dryomov
2022-08-25  9:41   ` Luís Henriques
2022-08-25 13:18     ` Jeff Layton
2022-08-25 10:56   ` Jeff Layton
2022-08-25 13:16     ` Jeff Layton
2022-08-26  0:07       ` Xiubo Li

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.