From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1MlKgg-0000Pr-1G for qemu-devel@nongnu.org; Wed, 09 Sep 2009 06:46:06 -0400 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1MlKgb-0000PL-BD for qemu-devel@nongnu.org; Wed, 09 Sep 2009 06:46:05 -0400 Received: from [199.232.76.173] (port=33992 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1MlKgb-0000PI-5T for qemu-devel@nongnu.org; Wed, 09 Sep 2009 06:46:01 -0400 Received: from mx1.redhat.com ([209.132.183.28]:52708) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1MlKgY-0004VT-DG for qemu-devel@nongnu.org; Wed, 09 Sep 2009 06:45:59 -0400 Message-ID: <4AA78721.8090208@redhat.com> Date: Wed, 09 Sep 2009 12:44:49 +0200 From: Kevin Wolf MIME-Version: 1.0 References: <1252414172-25721-1-git-send-email-kwolf@redhat.com> <1252414172-25721-2-git-send-email-kwolf@redhat.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Subject: [Qemu-devel] Re: [PATCH v2 1/2] Add bdrv_aio_multiwrite List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Juan Quintela Cc: qemu-devel@nongnu.org Am 08.09.2009 15:20, schrieb Juan Quintela: > Kevin Wolf wrote: >> One performance problem of qcow2 during the initial image growth are >> sequential writes that are not cluster aligned. In this case, when a first >> requests requires to allocate a new cluster but writes only to the first >> couple of sectors in that cluster, the rest of the cluster is zeroed - just >> to be overwritten by the following second request that fills up the cluster. >> >> Let's try to merge sequential write requests to the same cluster, so we can >> avoid to write the zero padding to the disk in the first place. >> >> As a nice side effect, also other formats take advantage of dealing with less >> and larger requests. >> >> Signed-off-by: Kevin Wolf >> --- >> block.c | 147 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> block.h | 15 ++++++ >> block_int.h | 3 + >> cutils.c | 17 +++++++ >> qemu-common.h | 1 + >> 5 files changed, 183 insertions(+), 0 deletions(-) >> >> diff --git a/block.c b/block.c >> index 033957d..d9cc257 100644 >> --- a/block.c >> +++ b/block.c >> @@ -1354,6 +1354,153 @@ BlockDriverAIOCB *bdrv_aio_writev(BlockDriverState *bs, int64_t sector_num, >> return ret; >> } >> >> + >> +typedef struct MultiwriteCB { >> + int error; >> + int num_requests; >> + int num_callbacks; >> + struct { >> + BlockDriverCompletionFunc *cb; >> + void *opaque; >> + QEMUIOVector *free_qiov; >> + } callbacks[]; >> +} MultiwriteCB; >> + >> +static void multiwrite_user_cb(MultiwriteCB *mcb) >> +{ >> + int i; >> + >> + for (i = 0; i < mcb->num_callbacks; i++) { >> + mcb->callbacks[i].cb(mcb->callbacks[i].opaque, mcb->error); >> + qemu_free(mcb->callbacks[i].free_qiov); >> + } >> +} >> + >> +static void multiwrite_cb(void *opaque, int ret) >> +{ >> + MultiwriteCB *mcb = opaque; >> + >> + if (ret < 0) { >> + mcb->error = ret; >> + multiwrite_user_cb(mcb); >> + } >> + >> + mcb->num_requests--; >> + if (mcb->num_requests == 0) { >> + if (mcb->error == 0) { >> + multiwrite_user_cb(mcb); >> + } >> + qemu_free(mcb); >> + } >> +} >> + >> +static int multiwrite_req_compare(const void *a, const void *b) >> +{ >> + return (((BlockRequest*) a)->sector - ((BlockRequest*) b)->sector); >> +} >> + >> +static void multiwrite_merge(BlockDriverState *bs, BlockRequest *reqs, >> + int num_reqs, MultiwriteCB *mcb) >> +{ >> + int i, outidx; >> + >> + // Sort requests by start sector >> + qsort(reqs, num_reqs, sizeof(*reqs), &multiwrite_req_compare); >> + >> + // Check if adjacent requests touch the same clusters. If so, combine them, >> + // filling up gaps with zero sectors. >> + outidx = 0; >> + for (i = 1; i < num_reqs; i++) { >> + int merge = 0; >> + >> + // This is only handling exactly sequential writes - all block >> + // drivers can benefit from merging in this case. For other >> + // patterns we need to ask the driver. >> + if (reqs[i].sector == reqs[outidx].sector + reqs[outidx].nb_sectors) { >> + merge = 1; > > shouldn't we handle overlapped writes? Didn't look too difficult once > you have this infrastructure. We could handle overlapping writes here in generic code, right. I'd prefer to start with something simple and add such extensions later. I'm not even sure if there is any point in optimizing overlapping write requests - there is little reason for a guest to submit them in the first place. What happens more frequently is that two requests touch the same cluster, but some sectors between the end of the first request and the start of the second request are left untouched. As long as the cluster is unallocated, we can optimize this case in a format specific way by inserting zero padding... >> + } >> + >> +#if 0 >> + if (!merge && drv->bdrv_check_merge) { >> + merge = drv->bdrv_check_merge(bs, &reqs[outidx], &reqs[i]); >> + } >> +#endif > > This can/should be removed :) ...and bringing this code to life. :-) To express in numbers what I said above, here some numbers of a RHEL installation: Possible merges: 45830 Merged: 18141 Gap between requests: 27689 Overlapping: 0 >> + >> + if (merge) { >> + reqs[outidx].nb_sectors += reqs[i].nb_sectors; > > This don't handle overlapping writes. Is there any test to be sure that > we are not having overlapping writes? Otherwise, adding support should > be trivial. Currently overlapping writes are never merged, so let's leave this for a patch on top that actually implements this functionality. Kevin