* [PATCH] rbd: fix I/O error propagation for reads
@ 2013-08-27 1:34 Josh Durgin
2013-08-27 3:27 ` Mike Dawson
2013-08-27 12:29 ` Alex Elder
0 siblings, 2 replies; 6+ messages in thread
From: Josh Durgin @ 2013-08-27 1:34 UTC (permalink / raw)
To: ceph-devel
When a request returns an error, the driver needs to report the entire
extent of the request as completed. Writes already did this, since
they always set xferred = length, but reads were skipping that step if
an error other than -ENOENT occurred. Instead, rbd would end up
passing 0 xferred to blk_end_request(), which would always report
needing more data. This resulted in an assert failing when more data
was required by the block layer, but all the object requests were
done:
[ 1868.719077] rbd: obj_request read result -108 xferred 0
[ 1868.719077]
[ 1868.719518] end_request: I/O error, dev rbd1, sector 0
[ 1868.719739]
[ 1868.719739] Assertion failure in rbd_img_obj_callback() at line 1736:
[ 1868.719739]
[ 1868.719739] rbd_assert(more ^ (which == img_request->obj_request_count));
Without this assert, reads that hit errors would hang forever, since
the block layer considered them incomplete.
Fixes: http://tracker.ceph.com/issues/5647
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
---
drivers/block/rbd.c | 14 +++++++-------
1 files changed, 7 insertions(+), 7 deletions(-)
diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index 0d669ae..f8fd7d3 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -1557,11 +1557,12 @@ rbd_img_obj_request_read_callback(struct rbd_obj_request *obj_request)
obj_request, obj_request->img_request, obj_request->result,
xferred, length);
/*
- * ENOENT means a hole in the image. We zero-fill the
- * entire length of the request. A short read also implies
- * zero-fill to the end of the request. Either way we
- * update the xferred count to indicate the whole request
- * was satisfied.
+ * ENOENT means a hole in the image. We zero-fill the entire
+ * length of the request. A short read also implies zero-fill
+ * to the end of the request. An error requires the whole
+ * length of the request to be reported finished with an error
+ * to the block layer. In each case we update the xferred
+ * count to indicate the whole request was satisfied.
*/
rbd_assert(obj_request->type != OBJ_REQUEST_NODATA);
if (obj_request->result == -ENOENT) {
@@ -1570,14 +1571,13 @@ rbd_img_obj_request_read_callback(struct rbd_obj_request *obj_request)
else
zero_pages(obj_request->pages, 0, length);
obj_request->result = 0;
- obj_request->xferred = length;
} else if (xferred < length && !obj_request->result) {
if (obj_request->type == OBJ_REQUEST_BIO)
zero_bio_chain(obj_request->bio_list, xferred);
else
zero_pages(obj_request->pages, xferred, length);
- obj_request->xferred = length;
}
+ obj_request->xferred = length;
obj_request_done_set(obj_request);
}
--
1.7.2.5
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] rbd: fix I/O error propagation for reads
2013-08-27 1:34 [PATCH] rbd: fix I/O error propagation for reads Josh Durgin
@ 2013-08-27 3:27 ` Mike Dawson
2013-08-27 7:19 ` Josh Durgin
2013-08-27 12:29 ` Alex Elder
1 sibling, 1 reply; 6+ messages in thread
From: Mike Dawson @ 2013-08-27 3:27 UTC (permalink / raw)
To: Josh Durgin; +Cc: ceph-devel
Josh,
The original bug is marked as krbd, but could this bug could affect rbd
volumes mounted via qemu as well? If so, could you describe how it might
block a qemu guest?
We've been fighting i/o issues on some of our guests for some time. With
qemu 1.4.0, we saw the entire guest freeze. But now with qemu 1.5.2
which includes your asynchronous flush patch, the issue is typified by
periodic dips in performance and high latency (especially for reads, it
seems). Could this bug be related?
Thanks,
Mike Dawson
On 8/26/2013 9:34 PM, Josh Durgin wrote:
> When a request returns an error, the driver needs to report the entire
> extent of the request as completed. Writes already did this, since
> they always set xferred = length, but reads were skipping that step if
> an error other than -ENOENT occurred. Instead, rbd would end up
> passing 0 xferred to blk_end_request(), which would always report
> needing more data. This resulted in an assert failing when more data
> was required by the block layer, but all the object requests were
> done:
>
> [ 1868.719077] rbd: obj_request read result -108 xferred 0
> [ 1868.719077]
> [ 1868.719518] end_request: I/O error, dev rbd1, sector 0
> [ 1868.719739]
> [ 1868.719739] Assertion failure in rbd_img_obj_callback() at line 1736:
> [ 1868.719739]
> [ 1868.719739] rbd_assert(more ^ (which == img_request->obj_request_count));
>
> Without this assert, reads that hit errors would hang forever, since
> the block layer considered them incomplete.
>
> Fixes: http://tracker.ceph.com/issues/5647
> Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
> ---
> drivers/block/rbd.c | 14 +++++++-------
> 1 files changed, 7 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
> index 0d669ae..f8fd7d3 100644
> --- a/drivers/block/rbd.c
> +++ b/drivers/block/rbd.c
> @@ -1557,11 +1557,12 @@ rbd_img_obj_request_read_callback(struct rbd_obj_request *obj_request)
> obj_request, obj_request->img_request, obj_request->result,
> xferred, length);
> /*
> - * ENOENT means a hole in the image. We zero-fill the
> - * entire length of the request. A short read also implies
> - * zero-fill to the end of the request. Either way we
> - * update the xferred count to indicate the whole request
> - * was satisfied.
> + * ENOENT means a hole in the image. We zero-fill the entire
> + * length of the request. A short read also implies zero-fill
> + * to the end of the request. An error requires the whole
> + * length of the request to be reported finished with an error
> + * to the block layer. In each case we update the xferred
> + * count to indicate the whole request was satisfied.
> */
> rbd_assert(obj_request->type != OBJ_REQUEST_NODATA);
> if (obj_request->result == -ENOENT) {
> @@ -1570,14 +1571,13 @@ rbd_img_obj_request_read_callback(struct rbd_obj_request *obj_request)
> else
> zero_pages(obj_request->pages, 0, length);
> obj_request->result = 0;
> - obj_request->xferred = length;
> } else if (xferred < length && !obj_request->result) {
> if (obj_request->type == OBJ_REQUEST_BIO)
> zero_bio_chain(obj_request->bio_list, xferred);
> else
> zero_pages(obj_request->pages, xferred, length);
> - obj_request->xferred = length;
> }
> + obj_request->xferred = length;
> obj_request_done_set(obj_request);
> }
>
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] rbd: fix I/O error propagation for reads
2013-08-27 3:27 ` Mike Dawson
@ 2013-08-27 7:19 ` Josh Durgin
0 siblings, 0 replies; 6+ messages in thread
From: Josh Durgin @ 2013-08-27 7:19 UTC (permalink / raw)
To: Mike Dawson; +Cc: ceph-devel
On 08/26/2013 08:27 PM, Mike Dawson wrote:
> Josh,
>
> The original bug is marked as krbd, but could this bug could affect rbd
> volumes mounted via qemu as well? If so, could you describe how it might
> block a qemu guest?
No, this is just a patch for the kernel rbd driver, which doesn't
affect qemu at all.
> We've been fighting i/o issues on some of our guests for some time. With
> qemu 1.4.0, we saw the entire guest freeze. But now with qemu 1.5.2
> which includes your asynchronous flush patch, the issue is typified by
> periodic dips in performance and high latency (especially for reads, it
> seems). Could this bug be related?
A good next step for tracking this down would be narrowing in on the
source of the periods of high latency - starting with whether they're
primarily coming from the server or client side. Since it's especially
reads, I'd guess it's more likely to be an osd-side issue. If you look
at the admin socket's dump_historic_ops do you see higher op durations
around the dips in performance? What about any correlation with
underlying disk stats from iostat -x?
> Thanks,
> Mike Dawson
>
>
> On 8/26/2013 9:34 PM, Josh Durgin wrote:
>> When a request returns an error, the driver needs to report the entire
>> extent of the request as completed. Writes already did this, since
>> they always set xferred = length, but reads were skipping that step if
>> an error other than -ENOENT occurred. Instead, rbd would end up
>> passing 0 xferred to blk_end_request(), which would always report
>> needing more data. This resulted in an assert failing when more data
>> was required by the block layer, but all the object requests were
>> done:
>>
>> [ 1868.719077] rbd: obj_request read result -108 xferred 0
>> [ 1868.719077]
>> [ 1868.719518] end_request: I/O error, dev rbd1, sector 0
>> [ 1868.719739]
>> [ 1868.719739] Assertion failure in rbd_img_obj_callback() at line 1736:
>> [ 1868.719739]
>> [ 1868.719739] rbd_assert(more ^ (which ==
>> img_request->obj_request_count));
>>
>> Without this assert, reads that hit errors would hang forever, since
>> the block layer considered them incomplete.
>>
>> Fixes: http://tracker.ceph.com/issues/5647
>> Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
>> ---
>> drivers/block/rbd.c | 14 +++++++-------
>> 1 files changed, 7 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
>> index 0d669ae..f8fd7d3 100644
>> --- a/drivers/block/rbd.c
>> +++ b/drivers/block/rbd.c
>> @@ -1557,11 +1557,12 @@ rbd_img_obj_request_read_callback(struct
>> rbd_obj_request *obj_request)
>> obj_request, obj_request->img_request, obj_request->result,
>> xferred, length);
>> /*
>> - * ENOENT means a hole in the image. We zero-fill the
>> - * entire length of the request. A short read also implies
>> - * zero-fill to the end of the request. Either way we
>> - * update the xferred count to indicate the whole request
>> - * was satisfied.
>> + * ENOENT means a hole in the image. We zero-fill the entire
>> + * length of the request. A short read also implies zero-fill
>> + * to the end of the request. An error requires the whole
>> + * length of the request to be reported finished with an error
>> + * to the block layer. In each case we update the xferred
>> + * count to indicate the whole request was satisfied.
>> */
>> rbd_assert(obj_request->type != OBJ_REQUEST_NODATA);
>> if (obj_request->result == -ENOENT) {
>> @@ -1570,14 +1571,13 @@ rbd_img_obj_request_read_callback(struct
>> rbd_obj_request *obj_request)
>> else
>> zero_pages(obj_request->pages, 0, length);
>> obj_request->result = 0;
>> - obj_request->xferred = length;
>> } else if (xferred < length && !obj_request->result) {
>> if (obj_request->type == OBJ_REQUEST_BIO)
>> zero_bio_chain(obj_request->bio_list, xferred);
>> else
>> zero_pages(obj_request->pages, xferred, length);
>> - obj_request->xferred = length;
>> }
>> + obj_request->xferred = length;
>> obj_request_done_set(obj_request);
>> }
>>
>>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] rbd: fix I/O error propagation for reads
2013-08-27 1:34 [PATCH] rbd: fix I/O error propagation for reads Josh Durgin
2013-08-27 3:27 ` Mike Dawson
@ 2013-08-27 12:29 ` Alex Elder
2013-08-27 15:36 ` Sage Weil
1 sibling, 1 reply; 6+ messages in thread
From: Alex Elder @ 2013-08-27 12:29 UTC (permalink / raw)
To: Josh Durgin; +Cc: ceph-devel
On 08/26/2013 08:34 PM, Josh Durgin wrote:
> When a request returns an error, the driver needs to report the entire
> extent of the request as completed. Writes already did this, since
You're right. The block layer needs to "consume" the bytes in this
portion of the image request whether or not they were completed
successfully.
This looks good to me.
Reviewed-by: Alex Elder <elder@linaro.org>
> they always set xferred = length, but reads were skipping that step if
> an error other than -ENOENT occurred. Instead, rbd would end up
> passing 0 xferred to blk_end_request(), which would always report
> needing more data. This resulted in an assert failing when more data
> was required by the block layer, but all the object requests were
> done:
>
> [ 1868.719077] rbd: obj_request read result -108 xferred 0
> [ 1868.719077]
> [ 1868.719518] end_request: I/O error, dev rbd1, sector 0
> [ 1868.719739]
> [ 1868.719739] Assertion failure in rbd_img_obj_callback() at line 1736:
> [ 1868.719739]
> [ 1868.719739] rbd_assert(more ^ (which == img_request->obj_request_count));
>
> Without this assert, reads that hit errors would hang forever, since
> the block layer considered them incomplete.
>
> Fixes: http://tracker.ceph.com/issues/5647
> Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
> ---
> drivers/block/rbd.c | 14 +++++++-------
> 1 files changed, 7 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
> index 0d669ae..f8fd7d3 100644
> --- a/drivers/block/rbd.c
> +++ b/drivers/block/rbd.c
> @@ -1557,11 +1557,12 @@ rbd_img_obj_request_read_callback(struct rbd_obj_request *obj_request)
> obj_request, obj_request->img_request, obj_request->result,
> xferred, length);
> /*
> - * ENOENT means a hole in the image. We zero-fill the
> - * entire length of the request. A short read also implies
> - * zero-fill to the end of the request. Either way we
> - * update the xferred count to indicate the whole request
> - * was satisfied.
> + * ENOENT means a hole in the image. We zero-fill the entire
> + * length of the request. A short read also implies zero-fill
> + * to the end of the request. An error requires the whole
> + * length of the request to be reported finished with an error
> + * to the block layer. In each case we update the xferred
> + * count to indicate the whole request was satisfied.
> */
> rbd_assert(obj_request->type != OBJ_REQUEST_NODATA);
> if (obj_request->result == -ENOENT) {
> @@ -1570,14 +1571,13 @@ rbd_img_obj_request_read_callback(struct rbd_obj_request *obj_request)
> else
> zero_pages(obj_request->pages, 0, length);
> obj_request->result = 0;
> - obj_request->xferred = length;
> } else if (xferred < length && !obj_request->result) {
> if (obj_request->type == OBJ_REQUEST_BIO)
> zero_bio_chain(obj_request->bio_list, xferred);
> else
> zero_pages(obj_request->pages, xferred, length);
> - obj_request->xferred = length;
> }
> + obj_request->xferred = length;
> obj_request_done_set(obj_request);
> }
>
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] rbd: fix I/O error propagation for reads
2013-08-27 12:29 ` Alex Elder
@ 2013-08-27 15:36 ` Sage Weil
2013-08-27 15:40 ` Alex Elder
0 siblings, 1 reply; 6+ messages in thread
From: Sage Weil @ 2013-08-27 15:36 UTC (permalink / raw)
To: Alex Elder; +Cc: Josh Durgin, ceph-devel
On Tue, 27 Aug 2013, Alex Elder wrote:
> On 08/26/2013 08:34 PM, Josh Durgin wrote:
> > When a request returns an error, the driver needs to report the entire
> > extent of the request as completed. Writes already did this, since
>
> You're right. The block layer needs to "consume" the bytes in this
> portion of the image request whether or not they were completed
> successfully.
>
> This looks good to me.
>
> Reviewed-by: Alex Elder <elder@linaro.org>
This one should go to Linus for 3.11. I added this tot he testing branch
and put a CC stable for 3.10 in there.. is that the right set of kernels
to backport to?
Thanks!
sage
>
> > they always set xferred = length, but reads were skipping that step if
> > an error other than -ENOENT occurred. Instead, rbd would end up
> > passing 0 xferred to blk_end_request(), which would always report
> > needing more data. This resulted in an assert failing when more data
> > was required by the block layer, but all the object requests were
> > done:
> >
> > [ 1868.719077] rbd: obj_request read result -108 xferred 0
> > [ 1868.719077]
> > [ 1868.719518] end_request: I/O error, dev rbd1, sector 0
> > [ 1868.719739]
> > [ 1868.719739] Assertion failure in rbd_img_obj_callback() at line 1736:
> > [ 1868.719739]
> > [ 1868.719739] rbd_assert(more ^ (which == img_request->obj_request_count));
> >
> > Without this assert, reads that hit errors would hang forever, since
> > the block layer considered them incomplete.
> >
> > Fixes: http://tracker.ceph.com/issues/5647
> > Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
> > ---
> > drivers/block/rbd.c | 14 +++++++-------
> > 1 files changed, 7 insertions(+), 7 deletions(-)
> >
> > diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
> > index 0d669ae..f8fd7d3 100644
> > --- a/drivers/block/rbd.c
> > +++ b/drivers/block/rbd.c
> > @@ -1557,11 +1557,12 @@ rbd_img_obj_request_read_callback(struct rbd_obj_request *obj_request)
> > obj_request, obj_request->img_request, obj_request->result,
> > xferred, length);
> > /*
> > - * ENOENT means a hole in the image. We zero-fill the
> > - * entire length of the request. A short read also implies
> > - * zero-fill to the end of the request. Either way we
> > - * update the xferred count to indicate the whole request
> > - * was satisfied.
> > + * ENOENT means a hole in the image. We zero-fill the entire
> > + * length of the request. A short read also implies zero-fill
> > + * to the end of the request. An error requires the whole
> > + * length of the request to be reported finished with an error
> > + * to the block layer. In each case we update the xferred
> > + * count to indicate the whole request was satisfied.
> > */
> > rbd_assert(obj_request->type != OBJ_REQUEST_NODATA);
> > if (obj_request->result == -ENOENT) {
> > @@ -1570,14 +1571,13 @@ rbd_img_obj_request_read_callback(struct rbd_obj_request *obj_request)
> > else
> > zero_pages(obj_request->pages, 0, length);
> > obj_request->result = 0;
> > - obj_request->xferred = length;
> > } else if (xferred < length && !obj_request->result) {
> > if (obj_request->type == OBJ_REQUEST_BIO)
> > zero_bio_chain(obj_request->bio_list, xferred);
> > else
> > zero_pages(obj_request->pages, xferred, length);
> > - obj_request->xferred = length;
> > }
> > + obj_request->xferred = length;
> > obj_request_done_set(obj_request);
> > }
> >
> >
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] rbd: fix I/O error propagation for reads
2013-08-27 15:36 ` Sage Weil
@ 2013-08-27 15:40 ` Alex Elder
0 siblings, 0 replies; 6+ messages in thread
From: Alex Elder @ 2013-08-27 15:40 UTC (permalink / raw)
To: Sage Weil; +Cc: Josh Durgin, ceph-devel
On 08/27/2013 10:36 AM, Sage Weil wrote:
> On Tue, 27 Aug 2013, Alex Elder wrote:
>> On 08/26/2013 08:34 PM, Josh Durgin wrote:
>>> When a request returns an error, the driver needs to report the entire
>>> extent of the request as completed. Writes already did this, since
>>
>> You're right. The block layer needs to "consume" the bytes in this
>> portion of the image request whether or not they were completed
>> successfully.
>>
>> This looks good to me.
>>
>> Reviewed-by: Alex Elder <elder@linaro.org>
>
> This one should go to Linus for 3.11. I added this tot he testing branch
> and put a CC stable for 3.10 in there.. is that the right set of kernels
> to backport to?
3.10 yes. 3.9 is EOL. 3.4.59 (longterm) did not include this code,
so yes, that's the right set of kernels. (I did not check on anything
Ubuntu is supporting.)
-Alex
> Thanks!
> sage
>
> >
>>> they always set xferred = length, but reads were skipping that step if
>>> an error other than -ENOENT occurred. Instead, rbd would end up
>>> passing 0 xferred to blk_end_request(), which would always report
>>> needing more data. This resulted in an assert failing when more data
>>> was required by the block layer, but all the object requests were
>>> done:
>>>
>>> [ 1868.719077] rbd: obj_request read result -108 xferred 0
>>> [ 1868.719077]
>>> [ 1868.719518] end_request: I/O error, dev rbd1, sector 0
>>> [ 1868.719739]
>>> [ 1868.719739] Assertion failure in rbd_img_obj_callback() at line 1736:
>>> [ 1868.719739]
>>> [ 1868.719739] rbd_assert(more ^ (which == img_request->obj_request_count));
>>>
>>> Without this assert, reads that hit errors would hang forever, since
>>> the block layer considered them incomplete.
>>>
>>> Fixes: http://tracker.ceph.com/issues/5647
>>> Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
>>> ---
>>> drivers/block/rbd.c | 14 +++++++-------
>>> 1 files changed, 7 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
>>> index 0d669ae..f8fd7d3 100644
>>> --- a/drivers/block/rbd.c
>>> +++ b/drivers/block/rbd.c
>>> @@ -1557,11 +1557,12 @@ rbd_img_obj_request_read_callback(struct rbd_obj_request *obj_request)
>>> obj_request, obj_request->img_request, obj_request->result,
>>> xferred, length);
>>> /*
>>> - * ENOENT means a hole in the image. We zero-fill the
>>> - * entire length of the request. A short read also implies
>>> - * zero-fill to the end of the request. Either way we
>>> - * update the xferred count to indicate the whole request
>>> - * was satisfied.
>>> + * ENOENT means a hole in the image. We zero-fill the entire
>>> + * length of the request. A short read also implies zero-fill
>>> + * to the end of the request. An error requires the whole
>>> + * length of the request to be reported finished with an error
>>> + * to the block layer. In each case we update the xferred
>>> + * count to indicate the whole request was satisfied.
>>> */
>>> rbd_assert(obj_request->type != OBJ_REQUEST_NODATA);
>>> if (obj_request->result == -ENOENT) {
>>> @@ -1570,14 +1571,13 @@ rbd_img_obj_request_read_callback(struct rbd_obj_request *obj_request)
>>> else
>>> zero_pages(obj_request->pages, 0, length);
>>> obj_request->result = 0;
>>> - obj_request->xferred = length;
>>> } else if (xferred < length && !obj_request->result) {
>>> if (obj_request->type == OBJ_REQUEST_BIO)
>>> zero_bio_chain(obj_request->bio_list, xferred);
>>> else
>>> zero_pages(obj_request->pages, xferred, length);
>>> - obj_request->xferred = length;
>>> }
>>> + obj_request->xferred = length;
>>> obj_request_done_set(obj_request);
>>> }
>>>
>>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>>
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2013-08-27 15:40 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-08-27 1:34 [PATCH] rbd: fix I/O error propagation for reads Josh Durgin
2013-08-27 3:27 ` Mike Dawson
2013-08-27 7:19 ` Josh Durgin
2013-08-27 12:29 ` Alex Elder
2013-08-27 15:36 ` Sage Weil
2013-08-27 15:40 ` Alex Elder
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.