From mboxrd@z Thu Jan 1 00:00:00 1970 From: Josh Durgin Subject: Re: [PATCH] rbd: fix I/O error propagation for reads Date: Tue, 27 Aug 2013 00:19:22 -0700 Message-ID: <521C52FA.6070801@inktank.com> References: <1377567242-25736-1-git-send-email-josh.durgin@inktank.com> <521C1CB2.6090008@cloudapt.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Return-path: Received: from mail-ob0-f176.google.com ([209.85.214.176]:35456 "EHLO mail-ob0-f176.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752163Ab3H0HR4 (ORCPT ); Tue, 27 Aug 2013 03:17:56 -0400 Received: by mail-ob0-f176.google.com with SMTP id uz19so4347067obc.35 for ; Tue, 27 Aug 2013 00:17:56 -0700 (PDT) In-Reply-To: <521C1CB2.6090008@cloudapt.com> Sender: ceph-devel-owner@vger.kernel.org List-ID: To: Mike Dawson Cc: ceph-devel@vger.kernel.org On 08/26/2013 08:27 PM, Mike Dawson wrote: > Josh, > > The original bug is marked as krbd, but could this bug could affect rbd > volumes mounted via qemu as well? If so, could you describe how it might > block a qemu guest? No, this is just a patch for the kernel rbd driver, which doesn't affect qemu at all. > We've been fighting i/o issues on some of our guests for some time. With > qemu 1.4.0, we saw the entire guest freeze. But now with qemu 1.5.2 > which includes your asynchronous flush patch, the issue is typified by > periodic dips in performance and high latency (especially for reads, it > seems). Could this bug be related? A good next step for tracking this down would be narrowing in on the source of the periods of high latency - starting with whether they're primarily coming from the server or client side. Since it's especially reads, I'd guess it's more likely to be an osd-side issue. If you look at the admin socket's dump_historic_ops do you see higher op durations around the dips in performance? What about any correlation with underlying disk stats from iostat -x? > Thanks, > Mike Dawson > > > On 8/26/2013 9:34 PM, Josh Durgin wrote: >> When a request returns an error, the driver needs to report the entire >> extent of the request as completed. Writes already did this, since >> they always set xferred = length, but reads were skipping that step if >> an error other than -ENOENT occurred. Instead, rbd would end up >> passing 0 xferred to blk_end_request(), which would always report >> needing more data. This resulted in an assert failing when more data >> was required by the block layer, but all the object requests were >> done: >> >> [ 1868.719077] rbd: obj_request read result -108 xferred 0 >> [ 1868.719077] >> [ 1868.719518] end_request: I/O error, dev rbd1, sector 0 >> [ 1868.719739] >> [ 1868.719739] Assertion failure in rbd_img_obj_callback() at line 1736: >> [ 1868.719739] >> [ 1868.719739] rbd_assert(more ^ (which == >> img_request->obj_request_count)); >> >> Without this assert, reads that hit errors would hang forever, since >> the block layer considered them incomplete. >> >> Fixes: http://tracker.ceph.com/issues/5647 >> Signed-off-by: Josh Durgin >> --- >> drivers/block/rbd.c | 14 +++++++------- >> 1 files changed, 7 insertions(+), 7 deletions(-) >> >> diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c >> index 0d669ae..f8fd7d3 100644 >> --- a/drivers/block/rbd.c >> +++ b/drivers/block/rbd.c >> @@ -1557,11 +1557,12 @@ rbd_img_obj_request_read_callback(struct >> rbd_obj_request *obj_request) >> obj_request, obj_request->img_request, obj_request->result, >> xferred, length); >> /* >> - * ENOENT means a hole in the image. We zero-fill the >> - * entire length of the request. A short read also implies >> - * zero-fill to the end of the request. Either way we >> - * update the xferred count to indicate the whole request >> - * was satisfied. >> + * ENOENT means a hole in the image. We zero-fill the entire >> + * length of the request. A short read also implies zero-fill >> + * to the end of the request. An error requires the whole >> + * length of the request to be reported finished with an error >> + * to the block layer. In each case we update the xferred >> + * count to indicate the whole request was satisfied. >> */ >> rbd_assert(obj_request->type != OBJ_REQUEST_NODATA); >> if (obj_request->result == -ENOENT) { >> @@ -1570,14 +1571,13 @@ rbd_img_obj_request_read_callback(struct >> rbd_obj_request *obj_request) >> else >> zero_pages(obj_request->pages, 0, length); >> obj_request->result = 0; >> - obj_request->xferred = length; >> } else if (xferred < length && !obj_request->result) { >> if (obj_request->type == OBJ_REQUEST_BIO) >> zero_bio_chain(obj_request->bio_list, xferred); >> else >> zero_pages(obj_request->pages, xferred, length); >> - obj_request->xferred = length; >> } >> + obj_request->xferred = length; >> obj_request_done_set(obj_request); >> } >> >>