From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1JqrEQ-00060e-8P for qemu-devel@nongnu.org; Tue, 29 Apr 2008 10:54:58 -0400 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1JqrEO-000602-Ic for qemu-devel@nongnu.org; Tue, 29 Apr 2008 10:54:57 -0400 Received: from [199.232.76.173] (port=43546 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1JqrEO-0005zz-BY for qemu-devel@nongnu.org; Tue, 29 Apr 2008 10:54:56 -0400 Received: from ns.suse.de ([195.135.220.2] helo=mx1.suse.de) by monty-python.gnu.org with esmtps (TLS-1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.60) (envelope-from ) id 1JqrEN-0007jw-Ok for qemu-devel@nongnu.org; Tue, 29 Apr 2008 10:54:56 -0400 Message-ID: <48173591.9010609@suse.de> Date: Tue, 29 Apr 2008 16:49:53 +0200 From: Kevin Wolf MIME-Version: 1.0 Subject: Re: [Qemu-devel] [PATCH] Align file accesses with cache=off (O_DIRECT) References: <4807514B.9040607@suse.de> <4815EE89.70407@suse.de> <1209459667.4328.7.camel@frecb07144> In-Reply-To: <1209459667.4328.7.camel@frecb07144> Content-Type: multipart/mixed; boundary="------------040007040902070700030302" Reply-To: qemu-devel@nongnu.org List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Laurent Vivier Cc: qemu-devel@nongnu.org This is a multi-part message in MIME format. --------------040007040902070700030302 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Hi Laurent, Laurent Vivier schrieb: > But if we want too keep simplicity without memcpy(), we could only > de-activate O_DIRECT on pread() or pwrite() : You're right, this approach is simpler, better and makes the patch smaller. I've attached a new version of the patch. However, now that you've pointed me to your patch I realize that my patch might be simple but it is incomplete as well. Normal read/write operation on qcow images should work fine (only metadata is unaligned and there pread is used). Snapshots don't work though because they end up in aio requests which are not routed to raw_pread. Disabling O_DIRECT for a single aio request is impossible (after all, aio is asynchronous), and disabling it for at least one aio request is going to be ugly. So maybe we better turn O_DIRECT off for snapsnot saving/loading, even if it's not the generic fix I wanted to have when I started. I'm still undecided, though. What do you think? Kevin --------------040007040902070700030302 Content-Type: text/x-patch; name="align-odirect-accesses.patch" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="align-odirect-accesses.patch" Index: block-raw-posix.c =================================================================== --- block-raw-posix.c.orig +++ block-raw-posix.c @@ -77,6 +77,7 @@ typedef struct BDRVRawState { int fd; int type; + int flags; unsigned int lseek_err_cnt; #if defined(__linux__) /* linux floppy specific */ @@ -95,6 +96,7 @@ static int raw_open(BlockDriverState *bs BDRVRawState *s = bs->opaque; int fd, open_flags, ret; + s->flags = flags; s->lseek_err_cnt = 0; open_flags = O_BINARY; @@ -141,7 +143,14 @@ static int raw_open(BlockDriverState *bs #endif */ -static int raw_pread(BlockDriverState *bs, int64_t offset, +/* + * offset and count are in bytes, but must be multiples of 512 for files + * opened with O_DIRECT. buf must be aligned to 512 bytes then. + * + * This function may be called without alignment if the caller ensures + * that O_DIRECT is not in effect. + */ +static int raw_pread_aligned(BlockDriverState *bs, int64_t offset, uint8_t *buf, int count) { BDRVRawState *s = bs->opaque; @@ -194,7 +203,14 @@ label__raw_read__success: return ret; } -static int raw_pwrite(BlockDriverState *bs, int64_t offset, +/* + * offset and count are in bytes, but must be multiples of 512 for files + * opened with O_DIRECT. buf must be aligned to 512 bytes then. + * + * This function may be called without alignment if the caller ensures + * that O_DIRECT is not in effect. + */ +static int raw_pwrite_aligned(BlockDriverState *bs, int64_t offset, const uint8_t *buf, int count) { BDRVRawState *s = bs->opaque; @@ -230,6 +246,69 @@ label__raw_write__success: return ret; } + +#ifdef O_DIRECT +/* + * offset and count are in bytes and possibly not aligned. For files opened + * with O_DIRECT, necessary alignments are ensured before calling + * raw_pread_aligned to do the actual read. + */ +static int raw_pread(BlockDriverState *bs, int64_t offset, + uint8_t *buf, int count) +{ + BDRVRawState *s = bs->opaque; + + if (unlikely((s->flags & BDRV_O_DIRECT) && + (offset % 512 != 0 || (uintptr_t) buf % 512))) { + + int flags, ret; + + // Temporarily disable O_DIRECT for unaligned access + flags = fcntl(s->fd, F_GETFL); + fcntl(s->fd, F_SETFL, flags & ~O_DIRECT); + ret = raw_pread_aligned(bs, offset, buf, count); + fcntl(s->fd, F_SETFL, flags); + + return ret; + + } else { + return raw_pread_aligned(bs, offset, buf, count); + } +} + +/* + * offset and count are in bytes and possibly not aligned. For files opened + * with O_DIRECT, necessary alignments are ensured before calling + * raw_pwrite_aligned to do the actual write. + */ +static int raw_pwrite(BlockDriverState *bs, int64_t offset, + const uint8_t *buf, int count) +{ + BDRVRawState *s = bs->opaque; + + if (unlikely((s->flags & BDRV_O_DIRECT) && + (offset % 512 != 0 || (uintptr_t) buf % 512))) { + + int flags, ret; + + // Temporarily disable O_DIRECT for unaligned access + flags = fcntl(s->fd, F_GETFL); + fcntl(s->fd, F_SETFL, flags & ~O_DIRECT); + ret = raw_pwrite_aligned(bs, offset, buf, count); + fcntl(s->fd, F_SETFL, flags); + + return ret; + } else { + return raw_pwrite_aligned(bs, offset, buf, count); + } +} + +#else +#define raw_pread raw_pread_aligned +#define raw_pwrite raw_pwrite_aligned +#endif + + /***********************************************************/ /* Unix AIO using POSIX AIO */ --------------040007040902070700030302--