From: Jamie Lokier <jamie@shareable.org>
To: Christoph Hellwig <hch@lst.de>
Cc: qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [PATCH 2/3] barriers: block-raw-posix barrier support
Date: Tue, 5 May 2009 13:33:11 +0100 [thread overview]
Message-ID: <20090505123311.GD25328@shareable.org> (raw)
In-Reply-To: <20090505120836.GB30721@lst.de>
Christoph Hellwig wrote:
> Add support for write barriers to the posix raw file / block device code.
> The guts of this is in the aio emulation as that's where we handle our queue
> of outstanding requests.
It's nice to see this :-)
IDE and SCSI's cache flush commands should map to it nicely too.
> The highlevel design is the following:
>
> - As soon as a barrier request is submitted via qemu_paio_submit we
> increment the barrier_inprogress count to signal we now have to
> deal with barriers.
> - From that point on every new request that is queued up by
> qemu_paio_submit does not get onto the normal request list but a
> secondary post-barrier queue
>
> - Once the barrier request is dequeued by an aio_thread that thread waits for
> all other outstanding requests to finish, issues an fdatasync, the actual
> barrier request, another fdatasync to prevent reordering in the pagecache.
You don't need two fdatasyncs if the barrier request is just a
barrier, no data write, used only to flush previously written data by
a guest's fsync/fdatasync implementation.
> After the request is finished the barrier_inprogress counter is decrement,
> the post-barrier list is splice back onto the main request list up to and
> including the next barrier request if there is one and normal operation
> is resumed.
>
> That means barrier mean a quite massive serialization of the I/O submission
> path, which unfortunately is not avoidable given their semantics.
This is the best argument yet for having distinct "barrier" and "sync"
operations. "Barrier" is for ordering I/O, such as journalling
filesystems.
"Sync" is to be sent after guest fsync/fdatasync, to commit data
written so far to storage. It waits for the data to be committed, and
also asks the data to be written sooner.
"Sync" operations don't need to serialise I/O as much: it's ok to
initiate later writes in parallel, and this is enough to keep the
storage busy when there's a steady stream of guest fsyncs.
Both together, "Barrier|Sync" would do what you've implemented: force
ordering, write data quickly, and wait until it's committed to hard
storage.
Although Linux doesn't separate these two concepts (yet), because of
I/O serialisation it might make a measurable difference to fsync-heavy
workloads for virtio to have two separate bits, one for each concept,
and then add the necessary tweaks to guests kernels to use only one or
both bits as needed.
-- Jamie
next prev parent reply other threads:[~2009-05-05 12:33 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-05-05 12:08 [Qemu-devel] [PATCH 0/3] write barrier support Christoph Hellwig
2009-05-05 12:08 ` [Qemu-devel] barriers: block layer preparations Christoph Hellwig
2009-05-05 13:51 ` Avi Kivity
2009-05-05 15:38 ` Jamie Lokier
2009-05-05 15:49 ` Avi Kivity
2009-05-05 16:00 ` Jamie Lokier
2009-05-05 20:57 ` Christoph Hellwig
2009-05-05 22:49 ` Jamie Lokier
2009-05-05 12:08 ` [Qemu-devel] [PATCH 2/3] barriers: block-raw-posix barrier support Christoph Hellwig
2009-05-05 12:33 ` Jamie Lokier [this message]
2009-05-05 13:29 ` Christoph Hellwig
2009-05-05 16:00 ` Jamie Lokier
2009-05-05 12:09 ` [Qemu-devel] [PATCH 3/3] barriers: virtio Christoph Hellwig
2009-05-05 13:53 ` [Qemu-devel] [PATCH 0/3] write barrier support Avi Kivity
2009-05-05 21:00 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090505123311.GD25328@shareable.org \
--to=jamie@shareable.org \
--cc=hch@lst.de \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).