From: Laurent Vivier <Laurent.Vivier@bull.net>
To: Kevin Wolf <kwolf@suse.de>
Cc: qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [PATCH] Align file accesses with cache=off (O_DIRECT)
Date: Tue, 29 Apr 2008 17:48:24 +0200 [thread overview]
Message-ID: <1209484105.4248.27.camel@frecb07144> (raw)
In-Reply-To: <48173591.9010609@suse.de>
Le mardi 29 avril 2008 à 16:49 +0200, Kevin Wolf a écrit :
> Hi Laurent,
>
> Laurent Vivier schrieb:
> > But if we want too keep simplicity without memcpy(), we could only
> > de-activate O_DIRECT on pread() or pwrite() :
>
> You're right, this approach is simpler, better and makes the patch
> smaller. I've attached a new version of the patch.
>
> However, now that you've pointed me to your patch I realize that my
> patch might be simple but it is incomplete as well. Normal read/write
> operation on qcow images should work fine (only metadata is unaligned
> and there pread is used). Snapshots don't work though because they end
> up in aio requests which are not routed to raw_pread.
>
> Disabling O_DIRECT for a single aio request is impossible (after all,
> aio is asynchronous), and disabling it for at least one aio request is
Perhaps I'm wrong, but I think it is possible: the only consequence is
the asynchronous I/O becomes synchronous...
> going to be ugly. So maybe we better turn O_DIRECT off for snapsnot
> saving/loading, even if it's not the generic fix I wanted to have when I
> started.
I don't think it is a good idea:
In linux world, there are three reasons to use O_DIRECT:
1- to use linux AIO (not POSIX AIO).
2- to avoid a buffer copy between user- and kernel- space
(performance ?)
3- to increase reliability: by using O_DIRECT you are sure your data are
on the disk when the write is over and your system can now crash (if it
wants).
And I think reliability is better when the snapshot is being saved...
> I'm still undecided, though. What do you think?
Is it possible to align the last AIO ?
And see comments below
> Index: block-raw-posix.c
> ===================================================================
> --- block-raw-posix.c.orig
> +++ block-raw-posix.c
> @@ -77,6 +77,7 @@
> typedef struct BDRVRawState {
> int fd;
> int type;
> + int flags;
> unsigned int lseek_err_cnt;
> #if defined(__linux__)
> /* linux floppy specific */
> @@ -95,6 +96,7 @@ static int raw_open(BlockDriverState *bs
> BDRVRawState *s = bs->opaque;
> int fd, open_flags, ret;
>
> + s->flags = flags;
> s->lseek_err_cnt = 0;
I think you should store open_flags instead of flags (see below).
> open_flags = O_BINARY;
> @@ -141,7 +143,14 @@ static int raw_open(BlockDriverState *bs
> #endif
> */
>
> -static int raw_pread(BlockDriverState *bs, int64_t offset,
> +/*
> + * offset and count are in bytes, but must be multiples of 512 for
> files
> + * opened with O_DIRECT. buf must be aligned to 512 bytes then.
> + *
> + * This function may be called without alignment if the caller
> ensures
> + * that O_DIRECT is not in effect.
> + */
> +static int raw_pread_aligned(BlockDriverState *bs, int64_t offset,
> uint8_t *buf, int count)
> {
> BDRVRawState *s = bs->opaque;
> @@ -194,7 +203,14 @@ label__raw_read__success:
> return ret;
> }
>
> -static int raw_pwrite(BlockDriverState *bs, int64_t offset,
> +/*
> + * offset and count are in bytes, but must be multiples of 512 for
> files
> + * opened with O_DIRECT. buf must be aligned to 512 bytes then.
> + *
> + * This function may be called without alignment if the caller
> ensures
> + * that O_DIRECT is not in effect.
> + */
> +static int raw_pwrite_aligned(BlockDriverState *bs, int64_t offset,
> const uint8_t *buf, int count)
> {
> BDRVRawState *s = bs->opaque;
> @@ -230,6 +246,69 @@ label__raw_write__success:
> return ret;
> }
>
> +
> +#ifdef O_DIRECT
> +/*
> + * offset and count are in bytes and possibly not aligned. For files
> opened
> + * with O_DIRECT, necessary alignments are ensured before calling
> + * raw_pread_aligned to do the actual read.
> + */
> +static int raw_pread(BlockDriverState *bs, int64_t offset,
> + uint8_t *buf, int count)
> +{
> + BDRVRawState *s = bs->opaque;
> +
> + if (unlikely((s->flags & BDRV_O_DIRECT) &&
> + (offset % 512 != 0 || (uintptr_t) buf % 512))) {
> +
> + int flags, ret;
> +
> + // Temporarily disable O_DIRECT for unaligned access
> + flags = fcntl(s->fd, F_GETFL);
> + fcntl(s->fd, F_SETFL, flags & ~O_DIRECT);
> + ret = raw_pread_aligned(bs, offset, buf, count);
> + fcntl(s->fd, F_SETFL, flags);
> +
> + return ret;
if you store open_flag instead of flags, you can do:
if (unlikely((s->open_flags & O_DIRECT) &&
(offset % 512 || (uintptr_t) buf % 512))) {
fcntl(s->fd, F_SETFL, s->open_flags & ~O_DIRECT);
ret = raw_pread_aligned(bs, offset, buf, count);
fcntl(s->fd, F_SETFL, s->open_flags);
}
> + } else {
> + return raw_pread_aligned(bs, offset, buf, count);
> + }
> +}
> +
> +/*
> + * offset and count are in bytes and possibly not aligned. For files
> opened
> + * with O_DIRECT, necessary alignments are ensured before calling
> + * raw_pwrite_aligned to do the actual write.
> + */
> +static int raw_pwrite(BlockDriverState *bs, int64_t offset,
> + const uint8_t *buf, int count)
> +{
> + BDRVRawState *s = bs->opaque;
> +
> + if (unlikely((s->flags & BDRV_O_DIRECT) &&
> + (offset % 512 != 0 || (uintptr_t) buf % 512))) {
> +
> + int flags, ret;
> +
> + // Temporarily disable O_DIRECT for unaligned access
> + flags = fcntl(s->fd, F_GETFL);
> + fcntl(s->fd, F_SETFL, flags & ~O_DIRECT);
> + ret = raw_pwrite_aligned(bs, offset, buf, count);
> + fcntl(s->fd, F_SETFL, flags);
ditto
> + return ret;
> + } else {
> + return raw_pwrite_aligned(bs, offset, buf, count);
> + }
> +}
> +
> +#else
> +#define raw_pread raw_pread_aligned
> +#define raw_pwrite raw_pwrite_aligned
> +#endif
> +
> +
> /***********************************************************/
> /* Unix AIO using POSIX AIO */
>
>
Regards,
Laurent
--
------------- Laurent.Vivier@bull.net ---------------
"The best way to predict the future is to invent it."
- Alan Kay
next prev parent reply other threads:[~2008-04-29 15:48 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-04-17 13:31 [Qemu-devel] [PATCH] Align file accesses with cache=off (O_DIRECT) Kevin Wolf
2008-04-28 15:34 ` Kevin Wolf
2008-04-29 9:01 ` Laurent Vivier
2008-04-29 14:49 ` Kevin Wolf
2008-04-29 15:48 ` Laurent Vivier [this message]
2008-04-29 16:21 ` Kevin Wolf
2008-04-29 16:48 ` Laurent Vivier
2008-04-30 9:21 ` Kevin Wolf
2008-04-30 9:59 ` Laurent Vivier
2008-04-30 12:08 ` Kevin Wolf
2008-04-30 14:30 ` Blue Swirl
2008-04-30 21:05 ` Kevin Wolf
2008-05-01 14:35 ` Blue Swirl
2008-05-01 17:55 ` Kevin Wolf
2008-05-06 8:44 ` Kevin Wolf
2008-05-06 9:02 ` Laurent Vivier
2008-05-06 16:42 ` Blue Swirl
2008-05-06 16:56 ` Kevin Wolf
2008-05-06 17:23 ` Blue Swirl
2008-04-30 0:05 ` Jamie Lokier
2008-04-30 0:02 ` Jamie Lokier
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1209484105.4248.27.camel@frecb07144 \
--to=laurent.vivier@bull.net \
--cc=kwolf@suse.de \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).