* [PATCH] ceph: fix error handling in ceph_sync_write @ 2022-08-24 20:53 Jeff Layton 2022-08-25 1:22 ` Xiubo Li 2022-08-25 8:32 ` Ilya Dryomov 0 siblings, 2 replies; 8+ messages in thread From: Jeff Layton @ 2022-08-24 20:53 UTC (permalink / raw) To: xiubli; +Cc: idryomov, ceph-devel ceph_sync_write has assumed that a zero result in req->r_result means success. Testing with a recent cluster however shows the OSD returning a non-zero length written here. I'm not sure whether and when this changed, but fix the code to accept either result. Assume a negative result means error, and anything else is a success. If we're given a short length, then return a short write. Signed-off-by: Jeff Layton <jlayton@kernel.org> --- fs/ceph/file.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/fs/ceph/file.c b/fs/ceph/file.c index 86265713a743..c0b2c8968be9 100644 --- a/fs/ceph/file.c +++ b/fs/ceph/file.c @@ -1632,11 +1632,19 @@ ceph_sync_write(struct kiocb *iocb, struct iov_iter *from, loff_t pos, req->r_end_latency, len, ret); out: ceph_osdc_put_request(req); - if (ret != 0) { + if (ret < 0) { ceph_set_error_write(ci); break; } + /* + * FIXME: it's unclear whether all OSD versions return the + * length written on a write. For now, assume that a 0 return + * means that everything got written. + */ + if (ret && ret < len) + len = ret; + ceph_clear_error_write(ci); pos += len; written += len; -- 2.37.2 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH] ceph: fix error handling in ceph_sync_write 2022-08-24 20:53 [PATCH] ceph: fix error handling in ceph_sync_write Jeff Layton @ 2022-08-25 1:22 ` Xiubo Li 2022-08-25 8:32 ` Ilya Dryomov 1 sibling, 0 replies; 8+ messages in thread From: Xiubo Li @ 2022-08-25 1:22 UTC (permalink / raw) To: Jeff Layton; +Cc: idryomov, ceph-devel Hi Jeff, How reproducible of this ? Two weeks ago I have run some tests didn't hit this. -- Xiubo On 8/25/22 4:53 AM, Jeff Layton wrote: > ceph_sync_write has assumed that a zero result in req->r_result means > success. Testing with a recent cluster however shows the OSD returning > a non-zero length written here. I'm not sure whether and when this > changed, but fix the code to accept either result. > > Assume a negative result means error, and anything else is a success. If > we're given a short length, then return a short write. > > Signed-off-by: Jeff Layton <jlayton@kernel.org> > --- > fs/ceph/file.c | 10 +++++++++- > 1 file changed, 9 insertions(+), 1 deletion(-) > > diff --git a/fs/ceph/file.c b/fs/ceph/file.c > index 86265713a743..c0b2c8968be9 100644 > --- a/fs/ceph/file.c > +++ b/fs/ceph/file.c > @@ -1632,11 +1632,19 @@ ceph_sync_write(struct kiocb *iocb, struct iov_iter *from, loff_t pos, > req->r_end_latency, len, ret); > out: > ceph_osdc_put_request(req); > - if (ret != 0) { > + if (ret < 0) { > ceph_set_error_write(ci); > break; > } > > + /* > + * FIXME: it's unclear whether all OSD versions return the > + * length written on a write. For now, assume that a 0 return > + * means that everything got written. > + */ > + if (ret && ret < len) > + len = ret; > + > ceph_clear_error_write(ci); > pos += len; > written += len; ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] ceph: fix error handling in ceph_sync_write 2022-08-24 20:53 [PATCH] ceph: fix error handling in ceph_sync_write Jeff Layton 2022-08-25 1:22 ` Xiubo Li @ 2022-08-25 8:32 ` Ilya Dryomov 2022-08-25 9:41 ` Luís Henriques 2022-08-25 10:56 ` Jeff Layton 1 sibling, 2 replies; 8+ messages in thread From: Ilya Dryomov @ 2022-08-25 8:32 UTC (permalink / raw) To: Jeff Layton; +Cc: xiubli, ceph-devel On Wed, Aug 24, 2022 at 10:53 PM Jeff Layton <jlayton@kernel.org> wrote: > > ceph_sync_write has assumed that a zero result in req->r_result means > success. Testing with a recent cluster however shows the OSD returning > a non-zero length written here. I'm not sure whether and when this > changed, but fix the code to accept either result. > > Assume a negative result means error, and anything else is a success. If > we're given a short length, then return a short write. > > Signed-off-by: Jeff Layton <jlayton@kernel.org> > --- > fs/ceph/file.c | 10 +++++++++- > 1 file changed, 9 insertions(+), 1 deletion(-) > > diff --git a/fs/ceph/file.c b/fs/ceph/file.c > index 86265713a743..c0b2c8968be9 100644 > --- a/fs/ceph/file.c > +++ b/fs/ceph/file.c > @@ -1632,11 +1632,19 @@ ceph_sync_write(struct kiocb *iocb, struct iov_iter *from, loff_t pos, > req->r_end_latency, len, ret); > out: > ceph_osdc_put_request(req); > - if (ret != 0) { > + if (ret < 0) { > ceph_set_error_write(ci); > break; > } > > + /* > + * FIXME: it's unclear whether all OSD versions return the > + * length written on a write. For now, assume that a 0 return > + * means that everything got written. > + */ > + if (ret && ret < len) > + len = ret; > + > ceph_clear_error_write(ci); > pos += len; > written += len; > -- > 2.37.2 > Hi Jeff, AFAIK OSDs aren't allowed to return any kind of length on a write and there is no such thing as a short write. This definitely needs deeper investigation. What is the cluster version you are testing against? Thanks, Ilya ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] ceph: fix error handling in ceph_sync_write 2022-08-25 8:32 ` Ilya Dryomov @ 2022-08-25 9:41 ` Luís Henriques 2022-08-25 13:18 ` Jeff Layton 2022-08-25 10:56 ` Jeff Layton 1 sibling, 1 reply; 8+ messages in thread From: Luís Henriques @ 2022-08-25 9:41 UTC (permalink / raw) To: Ilya Dryomov; +Cc: Jeff Layton, xiubli, ceph-devel On Thu, Aug 25, 2022 at 10:32:56AM +0200, Ilya Dryomov wrote: > On Wed, Aug 24, 2022 at 10:53 PM Jeff Layton <jlayton@kernel.org> wrote: > > > > ceph_sync_write has assumed that a zero result in req->r_result means > > success. Testing with a recent cluster however shows the OSD returning > > a non-zero length written here. I'm not sure whether and when this > > changed, but fix the code to accept either result. > > > > Assume a negative result means error, and anything else is a success. If > > we're given a short length, then return a short write. > > > > Signed-off-by: Jeff Layton <jlayton@kernel.org> > > --- > > fs/ceph/file.c | 10 +++++++++- > > 1 file changed, 9 insertions(+), 1 deletion(-) > > > > diff --git a/fs/ceph/file.c b/fs/ceph/file.c > > index 86265713a743..c0b2c8968be9 100644 > > --- a/fs/ceph/file.c > > +++ b/fs/ceph/file.c > > @@ -1632,11 +1632,19 @@ ceph_sync_write(struct kiocb *iocb, struct iov_iter *from, loff_t pos, > > req->r_end_latency, len, ret); > > out: > > ceph_osdc_put_request(req); > > - if (ret != 0) { > > + if (ret < 0) { > > ceph_set_error_write(ci); > > break; > > } > > > > + /* > > + * FIXME: it's unclear whether all OSD versions return the > > + * length written on a write. For now, assume that a 0 return > > + * means that everything got written. > > + */ > > + if (ret && ret < len) > > + len = ret; > > + > > ceph_clear_error_write(ci); > > pos += len; > > written += len; > > -- > > 2.37.2 > > > > Hi Jeff, > > AFAIK OSDs aren't allowed to return any kind of length on a write > and there is no such thing as a short write. This definitely needs > deeper investigation. > > What is the cluster version you are testing against? OK, I'm only seeing 'ret' being set to the write length only when enabling encryption (i.e. with test_dummy_encryption mount option). So, maybe the right fix is something like: diff --git a/fs/ceph/file.c b/fs/ceph/file.c index 16dcade66923..5119d87d61fb 100644 --- a/fs/ceph/file.c +++ b/fs/ceph/file.c @@ -1889,6 +1889,7 @@ ceph_sync_write(struct kiocb *iocb, struct iov_iter *from, loff_t pos, ceph_release_page_vector(pages, num_pages); break; } + ret = 0; } req = ceph_osdc_new_request(osdc, &ci->i_layout, Cheers, -- Luís ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH] ceph: fix error handling in ceph_sync_write 2022-08-25 9:41 ` Luís Henriques @ 2022-08-25 13:18 ` Jeff Layton 0 siblings, 0 replies; 8+ messages in thread From: Jeff Layton @ 2022-08-25 13:18 UTC (permalink / raw) To: Luís Henriques, Ilya Dryomov; +Cc: xiubli, ceph-devel On Thu, 2022-08-25 at 10:41 +0100, Luís Henriques wrote: > On Thu, Aug 25, 2022 at 10:32:56AM +0200, Ilya Dryomov wrote: > > On Wed, Aug 24, 2022 at 10:53 PM Jeff Layton <jlayton@kernel.org> wrote: > > > > > > ceph_sync_write has assumed that a zero result in req->r_result means > > > success. Testing with a recent cluster however shows the OSD returning > > > a non-zero length written here. I'm not sure whether and when this > > > changed, but fix the code to accept either result. > > > > > > Assume a negative result means error, and anything else is a success. If > > > we're given a short length, then return a short write. > > > > > > Signed-off-by: Jeff Layton <jlayton@kernel.org> > > > --- > > > fs/ceph/file.c | 10 +++++++++- > > > 1 file changed, 9 insertions(+), 1 deletion(-) > > > > > > diff --git a/fs/ceph/file.c b/fs/ceph/file.c > > > index 86265713a743..c0b2c8968be9 100644 > > > --- a/fs/ceph/file.c > > > +++ b/fs/ceph/file.c > > > @@ -1632,11 +1632,19 @@ ceph_sync_write(struct kiocb *iocb, struct iov_iter *from, loff_t pos, > > > req->r_end_latency, len, ret); > > > out: > > > ceph_osdc_put_request(req); > > > - if (ret != 0) { > > > + if (ret < 0) { > > > ceph_set_error_write(ci); > > > break; > > > } > > > > > > + /* > > > + * FIXME: it's unclear whether all OSD versions return the > > > + * length written on a write. For now, assume that a 0 return > > > + * means that everything got written. > > > + */ > > > + if (ret && ret < len) > > > + len = ret; > > > + > > > ceph_clear_error_write(ci); > > > pos += len; > > > written += len; > > > -- > > > 2.37.2 > > > > > > > Hi Jeff, > > > > AFAIK OSDs aren't allowed to return any kind of length on a write > > and there is no such thing as a short write. This definitely needs > > deeper investigation. > > > > What is the cluster version you are testing against? > > OK, I'm only seeing 'ret' being set to the write length only when enabling > encryption (i.e. with test_dummy_encryption mount option). So, maybe the > right fix is something like: > > diff --git a/fs/ceph/file.c b/fs/ceph/file.c > index 16dcade66923..5119d87d61fb 100644 > --- a/fs/ceph/file.c > +++ b/fs/ceph/file.c > @@ -1889,6 +1889,7 @@ ceph_sync_write(struct kiocb *iocb, struct iov_iter *from, loff_t pos, > ceph_release_page_vector(pages, num_pages); > break; > } > + ret = 0; > } > > req = ceph_osdc_new_request(osdc, &ci->i_layout, > No, actually. I think the original patch I sent is just wrong. There was another bug (another missing ceph_osdc_wait_request) in the fscrypt stack that was causing r_result to not appear to be zero. I've fixed this in the ceph-fscrypt branch in my tree and it seems to be working. I'll plan do a re-send of the fscrypt patches that are not yet in the testing branch today. Thanks! -- Jeff Layton <jlayton@kernel.org> ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] ceph: fix error handling in ceph_sync_write 2022-08-25 8:32 ` Ilya Dryomov 2022-08-25 9:41 ` Luís Henriques @ 2022-08-25 10:56 ` Jeff Layton 2022-08-25 13:16 ` Jeff Layton 1 sibling, 1 reply; 8+ messages in thread From: Jeff Layton @ 2022-08-25 10:56 UTC (permalink / raw) To: Ilya Dryomov; +Cc: xiubli, ceph-devel On Thu, 2022-08-25 at 10:32 +0200, Ilya Dryomov wrote: > On Wed, Aug 24, 2022 at 10:53 PM Jeff Layton <jlayton@kernel.org> wrote: > > > > ceph_sync_write has assumed that a zero result in req->r_result means > > success. Testing with a recent cluster however shows the OSD returning > > a non-zero length written here. I'm not sure whether and when this > > changed, but fix the code to accept either result. > > > > Assume a negative result means error, and anything else is a success. If > > we're given a short length, then return a short write. > > > > Signed-off-by: Jeff Layton <jlayton@kernel.org> > > --- > > fs/ceph/file.c | 10 +++++++++- > > 1 file changed, 9 insertions(+), 1 deletion(-) > > > > diff --git a/fs/ceph/file.c b/fs/ceph/file.c > > index 86265713a743..c0b2c8968be9 100644 > > --- a/fs/ceph/file.c > > +++ b/fs/ceph/file.c > > @@ -1632,11 +1632,19 @@ ceph_sync_write(struct kiocb *iocb, struct iov_iter *from, loff_t pos, > > req->r_end_latency, len, ret); > > out: > > ceph_osdc_put_request(req); > > - if (ret != 0) { > > + if (ret < 0) { > > ceph_set_error_write(ci); > > break; > > } > > > > + /* > > + * FIXME: it's unclear whether all OSD versions return the > > + * length written on a write. For now, assume that a 0 return > > + * means that everything got written. > > + */ > > + if (ret && ret < len) > > + len = ret; > > + > > ceph_clear_error_write(ci); > > pos += len; > > written += len; > > -- > > 2.37.2 > > > > Hi Jeff, > > AFAIK OSDs aren't allowed to return any kind of length on a write > and there is no such thing as a short write. This definitely needs > deeper investigation. > > What is the cluster version you are testing against? > That's what I had thought too but I wasn't sure: [ceph: root@quad1 /]# ceph --version ceph version 17.0.0-14400-gf61b38dc (f61b38dc82e94f14e7a0a5f6a5888c0c78fafa6c) quincy (dev) I'll see if I can confirm that this is coming from the OSD and not some other layer as well. -- Jeff Layton <jlayton@kernel.org> ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] ceph: fix error handling in ceph_sync_write 2022-08-25 10:56 ` Jeff Layton @ 2022-08-25 13:16 ` Jeff Layton 2022-08-26 0:07 ` Xiubo Li 0 siblings, 1 reply; 8+ messages in thread From: Jeff Layton @ 2022-08-25 13:16 UTC (permalink / raw) To: Ilya Dryomov; +Cc: xiubli, ceph-devel On Thu, 2022-08-25 at 06:56 -0400, Jeff Layton wrote: > On Thu, 2022-08-25 at 10:32 +0200, Ilya Dryomov wrote: > > On Wed, Aug 24, 2022 at 10:53 PM Jeff Layton <jlayton@kernel.org> wrote: > > > > > > ceph_sync_write has assumed that a zero result in req->r_result means > > > success. Testing with a recent cluster however shows the OSD returning > > > a non-zero length written here. I'm not sure whether and when this > > > changed, but fix the code to accept either result. > > > > > > Assume a negative result means error, and anything else is a success. If > > > we're given a short length, then return a short write. > > > > > > Signed-off-by: Jeff Layton <jlayton@kernel.org> > > > --- > > > fs/ceph/file.c | 10 +++++++++- > > > 1 file changed, 9 insertions(+), 1 deletion(-) > > > > > > diff --git a/fs/ceph/file.c b/fs/ceph/file.c > > > index 86265713a743..c0b2c8968be9 100644 > > > --- a/fs/ceph/file.c > > > +++ b/fs/ceph/file.c > > > @@ -1632,11 +1632,19 @@ ceph_sync_write(struct kiocb *iocb, struct iov_iter *from, loff_t pos, > > > req->r_end_latency, len, ret); > > > out: > > > ceph_osdc_put_request(req); > > > - if (ret != 0) { > > > + if (ret < 0) { > > > ceph_set_error_write(ci); > > > break; > > > } > > > > > > + /* > > > + * FIXME: it's unclear whether all OSD versions return the > > > + * length written on a write. For now, assume that a 0 return > > > + * means that everything got written. > > > + */ > > > + if (ret && ret < len) > > > + len = ret; > > > + > > > ceph_clear_error_write(ci); > > > pos += len; > > > written += len; > > > -- > > > 2.37.2 > > > > > > > Hi Jeff, > > > > AFAIK OSDs aren't allowed to return any kind of length on a write > > and there is no such thing as a short write. This definitely needs > > deeper investigation. > > > > What is the cluster version you are testing against? > > > > That's what I had thought too but I wasn't sure: > > [ceph: root@quad1 /]# ceph --version > ceph version 17.0.0-14400-gf61b38dc (f61b38dc82e94f14e7a0a5f6a5888c0c78fafa6c) quincy (dev) > > I'll see if I can confirm that this is coming from the OSD and not some > other layer as well. My mistake. This bug turns out to be a different bug in the fscrypt stack. We can drop this patch (and I probably should have sent it as an RFC in the first place). Sorry for the noise! -- Jeff Layton <jlayton@kernel.org> ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] ceph: fix error handling in ceph_sync_write 2022-08-25 13:16 ` Jeff Layton @ 2022-08-26 0:07 ` Xiubo Li 0 siblings, 0 replies; 8+ messages in thread From: Xiubo Li @ 2022-08-26 0:07 UTC (permalink / raw) To: Jeff Layton, Ilya Dryomov; +Cc: ceph-devel On 8/25/22 9:16 PM, Jeff Layton wrote: > On Thu, 2022-08-25 at 06:56 -0400, Jeff Layton wrote: >> On Thu, 2022-08-25 at 10:32 +0200, Ilya Dryomov wrote: >>> On Wed, Aug 24, 2022 at 10:53 PM Jeff Layton <jlayton@kernel.org> wrote: >>>> ceph_sync_write has assumed that a zero result in req->r_result means >>>> success. Testing with a recent cluster however shows the OSD returning >>>> a non-zero length written here. I'm not sure whether and when this >>>> changed, but fix the code to accept either result. >>>> >>>> Assume a negative result means error, and anything else is a success. If >>>> we're given a short length, then return a short write. >>>> >>>> Signed-off-by: Jeff Layton <jlayton@kernel.org> >>>> --- >>>> fs/ceph/file.c | 10 +++++++++- >>>> 1 file changed, 9 insertions(+), 1 deletion(-) >>>> >>>> diff --git a/fs/ceph/file.c b/fs/ceph/file.c >>>> index 86265713a743..c0b2c8968be9 100644 >>>> --- a/fs/ceph/file.c >>>> +++ b/fs/ceph/file.c >>>> @@ -1632,11 +1632,19 @@ ceph_sync_write(struct kiocb *iocb, struct iov_iter *from, loff_t pos, >>>> req->r_end_latency, len, ret); >>>> out: >>>> ceph_osdc_put_request(req); >>>> - if (ret != 0) { >>>> + if (ret < 0) { >>>> ceph_set_error_write(ci); >>>> break; >>>> } >>>> >>>> + /* >>>> + * FIXME: it's unclear whether all OSD versions return the >>>> + * length written on a write. For now, assume that a 0 return >>>> + * means that everything got written. >>>> + */ >>>> + if (ret && ret < len) >>>> + len = ret; >>>> + >>>> ceph_clear_error_write(ci); >>>> pos += len; >>>> written += len; >>>> -- >>>> 2.37.2 >>>> >>> Hi Jeff, >>> >>> AFAIK OSDs aren't allowed to return any kind of length on a write >>> and there is no such thing as a short write. This definitely needs >>> deeper investigation. >>> >>> What is the cluster version you are testing against? >>> >> That's what I had thought too but I wasn't sure: >> >> [ceph: root@quad1 /]# ceph --version >> ceph version 17.0.0-14400-gf61b38dc (f61b38dc82e94f14e7a0a5f6a5888c0c78fafa6c) quincy (dev) >> >> I'll see if I can confirm that this is coming from the OSD and not some >> other layer as well. > My mistake. This bug turns out to be a different bug in the fscrypt > stack. We can drop this patch (and I probably should have sent it as an > RFC in the first place). Sorry for the noise! > Cool, thanks Jeff. I saw you new update about this, they look good to me and will test them. - Xiubo ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2022-08-26 0:07 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2022-08-24 20:53 [PATCH] ceph: fix error handling in ceph_sync_write Jeff Layton 2022-08-25 1:22 ` Xiubo Li 2022-08-25 8:32 ` Ilya Dryomov 2022-08-25 9:41 ` Luís Henriques 2022-08-25 13:18 ` Jeff Layton 2022-08-25 10:56 ` Jeff Layton 2022-08-25 13:16 ` Jeff Layton 2022-08-26 0:07 ` Xiubo Li
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.