From: Kevin Wolf <kwolf@redhat.com>
To: Yoshiaki Tamura <tamura.yoshiaki@lab.ntt.co.jp>
Cc: aliguori@us.ibm.com, dlaor@redhat.com, ananth@in.ibm.com,
kvm@vger.kernel.org, mst@redhat.com, mtosatti@redhat.com,
qemu-devel@nongnu.org, vatsa@linux.vnet.ibm.com,
blauwirbel@gmail.com, ohmura.kei@lab.ntt.co.jp, avi@redhat.com,
psuriset@linux.vnet.ibm.com, stefanha@linux.vnet.ibm.com
Subject: Re: [Qemu-devel] [PATCH 09/19] Introduce event-tap.
Date: Wed, 19 Jan 2011 14:50:57 +0100 [thread overview]
Message-ID: <4D36EC41.5050104@redhat.com> (raw)
In-Reply-To: <AANLkTinrjduWRudObXB61sYaFyHT0SOfyajiOmfxNaxS@mail.gmail.com>
Am 19.01.2011 14:04, schrieb Yoshiaki Tamura:
>>> +static void event_tap_blk_flush(EventTapBlkReq *blk_req)
>>> +{
>>> + BlockDriverState *bs;
>>> +
>>> + bs = bdrv_find(blk_req->device_name);
>>
>> Please store the BlockDriverState in blk_req. This code loops over all
>> block devices and does a string comparison - and that for each request.
>> You can also save the qemu_strdup() when creating the request.
>>
>> In the few places where you really need the device name (might be the
>> case for load/save, I'm not sure), you can still get it from the
>> BlockDriverState.
>
> I would do so for the primary side. Although we haven't
> implemented yet, we want to replay block requests from block
> layer on the secondary side, and need device name to restore
> BlockDriverState.
Hm, I see. I'm not happy about it, but I don't have a suggestion right
away how to avoid it.
>>
>>> +
>>> + if (blk_req->is_flush) {
>>> + bdrv_aio_flush(bs, blk_req->reqs[0].cb, blk_req->reqs[0].opaque);
>>
>> You need to handle errors. If bdrv_aio_flush returns NULL, call the
>> callback with -EIO.
>
> I'll do so.
>
>>
>>> + return;
>>> + }
>>> +
>>> + bdrv_aio_writev(bs, blk_req->reqs[0].sector, blk_req->reqs[0].qiov,
>>> + blk_req->reqs[0].nb_sectors, blk_req->reqs[0].cb,
>>> + blk_req->reqs[0].opaque);
>>
>> Same here.
>>
>>> + bdrv_flush(bs);
>>
>> This looks really strange. What is this supposed to do?
>>
>> One point is that you write it immediately after bdrv_aio_write, so you
>> get an fsync for which you don't know if it includes the current write
>> request or if it doesn't. Which data do you want to get flushed to the disk?
>
> I was expecting to flush the aio request that was just initiated.
> Am I misunderstanding the function?
Seems so. The function names don't use really clear terminology either,
so you're not the first one to fall in this trap. Basically we have:
* qemu_aio_flush() waits for all AIO requests to complete. I think you
wanted to have exactly this, but only for a single block device. Such a
function doesn't exist yet.
* bdrv_flush() makes sure that all successfully completed requests are
written to disk (by calling fsync)
* bdrv_aio_flush() is the asynchronous version of bdrv_flush, i.e. run
the fsync in the thread pool
>> The other thing is that you introduce a bdrv_flush for each request,
>> basically forcing everyone to something very similar to writethrough
>> mode. I'm sure this will have a big impact on performance.
>
> The reason is to avoid inversion of queued requests. Although
> processing one-by-one is heavy, wouldn't having requests flushed
> to disk out of order break the disk image?
No, that's fine. If a guest issues two requests at the same time, they
may complete in any order. You just need to make sure that you don't
call the completion callback before the request really has completed.
I'm just starting to wonder if the guest won't timeout the requests if
they are queued for too long. Even more, with IDE, it can only handle
one request at a time, so not completing requests doesn't sound like a
good idea at all. In what intervals is the event-tap queue flushed?
On the other hand, if you complete before actually writing out, you
don't get timeouts, but you signal success to the guest when the request
could still fail. What would you do in this case? With a writeback cache
mode we're fine, we can just fail the next flush (until then nothing is
guaranteed to be on disk and order doesn't matter either), but with
cache=writethrough we're in serious trouble.
Have you thought about this problem? Maybe we end up having to flush the
event-tap queue for each single write in writethrough mode.
>> Additionally, error handling is missing.
>
> I looked at the codes using bdrv_flush and realized some of them
> doesn't handle errors, but scsi-disk.c does. Should everyone
> handle errors or depends on the usage?
I added the return code only recently, it was a void function
previously. Probably some error handling should be added to all of them.
Kevin
next prev parent reply other threads:[~2011-01-19 13:49 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-01-19 5:44 [Qemu-devel] [PATCH 00/19] Kemari for KVM v0.2.6 Yoshiaki Tamura
2011-01-19 5:44 ` [Qemu-devel] [PATCH 01/19] Make QEMUFile buf expandable, and introduce qemu_realloc_buffer() and qemu_clear_buffer() Yoshiaki Tamura
2011-01-19 5:44 ` [Qemu-devel] [PATCH 02/19] Introduce read() to FdMigrationState Yoshiaki Tamura
2011-01-19 5:44 ` [Qemu-devel] [PATCH 03/19] Introduce skip_header parameter to qemu_loadvm_state() Yoshiaki Tamura
2011-01-19 5:44 ` [Qemu-devel] [PATCH 04/19] qemu-char: export socket_set_nodelay() Yoshiaki Tamura
2011-01-19 5:44 ` [Qemu-devel] [PATCH 05/19] vl.c: add deleted flag for deleting the handler Yoshiaki Tamura
2011-01-19 5:44 ` [Qemu-devel] [PATCH 06/19] virtio: decrement last_avail_idx with inuse before saving Yoshiaki Tamura
2011-01-19 5:44 ` [Qemu-devel] [PATCH 07/19] Introduce fault tolerant VM transaction QEMUFile and ft_mode Yoshiaki Tamura
2011-01-19 5:44 ` [Qemu-devel] [PATCH 08/19] savevm: introduce util functions to control ft_trans_file from savevm layer Yoshiaki Tamura
2011-01-19 5:44 ` [Qemu-devel] [PATCH 09/19] Introduce event-tap Yoshiaki Tamura
2011-01-19 9:38 ` Kevin Wolf
2011-01-19 13:04 ` Yoshiaki Tamura
2011-01-19 13:50 ` Kevin Wolf [this message]
2011-01-20 5:19 ` Yoshiaki Tamura
2011-01-20 9:15 ` Kevin Wolf
2011-01-20 10:39 ` Yoshiaki Tamura
2011-01-20 11:46 ` Kevin Wolf
2011-01-20 13:50 ` Yoshiaki Tamura
2011-01-20 14:21 ` Kevin Wolf
2011-01-20 15:48 ` Yoshiaki Tamura
2011-01-19 5:44 ` [Qemu-devel] [PATCH 10/19] Call init handler of event-tap at main() in vl.c Yoshiaki Tamura
2011-01-19 5:44 ` [Qemu-devel] [PATCH 11/19] ioport: insert event_tap_ioport() to ioport_write() Yoshiaki Tamura
2011-01-19 5:44 ` [Qemu-devel] [PATCH 12/19] Insert event_tap_mmio() to cpu_physical_memory_rw() in exec.c Yoshiaki Tamura
2011-01-19 5:44 ` [Qemu-devel] [PATCH 13/19] net: insert event-tap to qemu_send_packet() and qemu_sendv_packet_async() Yoshiaki Tamura
2011-01-19 5:44 ` [Qemu-devel] [PATCH 14/19] block: insert event-tap to bdrv_aio_writev() and bdrv_aio_flush() Yoshiaki Tamura
2011-01-19 9:05 ` Kevin Wolf
2011-01-19 12:06 ` Yoshiaki Tamura
2011-01-19 9:47 ` Kevin Wolf
2011-01-19 13:16 ` Yoshiaki Tamura
2011-01-19 14:08 ` Kevin Wolf
2011-01-20 5:01 ` Yoshiaki Tamura
2011-01-19 5:45 ` [Qemu-devel] [PATCH 15/19] savevm: introduce qemu_savevm_trans_{begin, commit} Yoshiaki Tamura
2011-01-19 5:45 ` [Qemu-devel] [PATCH 16/19] migration: introduce migrate_ft_trans_{put, get}_ready(), and modify migrate_fd_put_ready() when ft_mode is on Yoshiaki Tamura
2011-01-19 5:45 ` [Qemu-devel] [PATCH 17/19] migration-tcp: modify tcp_accept_incoming_migration() to handle ft_mode, and add a hack not to close fd when ft_mode is enabled Yoshiaki Tamura
2011-01-19 5:45 ` [Qemu-devel] [PATCH 18/19] Introduce -k option to enable FT migration mode (Kemari) Yoshiaki Tamura
2011-01-19 5:45 ` [Qemu-devel] [PATCH 19/19] migration: add a parser to accept FT migration incoming mode Yoshiaki Tamura
-- strict thread matches above, loose matches on Subject: below --
2011-02-08 11:01 [Qemu-devel] [PATCH 00/19] Kemari for KVM v0.2.9 Yoshiaki Tamura
2011-02-08 11:01 ` [Qemu-devel] [PATCH 09/19] Introduce event-tap Yoshiaki Tamura
2011-01-28 7:21 [Qemu-devel] [PATCH 00/19] Kemari for KVM v0.2.8 Yoshiaki Tamura
2011-01-28 7:21 ` [Qemu-devel] [PATCH 09/19] Introduce event-tap Yoshiaki Tamura
2011-01-26 9:41 [Qemu-devel] [PATCH 00/19] Kemari for KVM v0.2.7 Yoshiaki Tamura
2011-01-26 9:42 ` [Qemu-devel] [PATCH 09/19] Introduce event-tap Yoshiaki Tamura
2011-01-14 17:33 [Qemu-devel] [PATCH 00/19] Kemari for KVM v0.2.5 Yoshiaki Tamura
2011-01-14 17:33 ` [Qemu-devel] [PATCH 09/19] Introduce event-tap Yoshiaki Tamura
2011-01-13 17:15 [Qemu-devel] [PATCH 00/19] Kemari for KVM v0.2.4 Yoshiaki Tamura
2011-01-13 17:15 ` [Qemu-devel] [PATCH 09/19] Introduce event-tap Yoshiaki Tamura
2011-01-11 10:59 [Qemu-devel] [PATCH 00/19] Kemari for KVM v0.2.3 Yoshiaki Tamura
2011-01-11 10:59 ` [Qemu-devel] [PATCH 09/19] Introduce event-tap Yoshiaki Tamura
2010-12-27 8:25 [Qemu-devel] [PATCH 00/19] Kemari for KVM v0.2.2 Yoshiaki Tamura
2010-12-27 8:25 ` [Qemu-devel] [PATCH 09/19] Introduce event-tap Yoshiaki Tamura
2010-12-24 3:18 [Qemu-devel] [PATCH 00/19] Kemari for KVM v0.2.1 Yoshiaki Tamura
2010-12-24 3:18 ` [Qemu-devel] [PATCH 09/19] Introduce event-tap Yoshiaki Tamura
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4D36EC41.5050104@redhat.com \
--to=kwolf@redhat.com \
--cc=aliguori@us.ibm.com \
--cc=ananth@in.ibm.com \
--cc=avi@redhat.com \
--cc=blauwirbel@gmail.com \
--cc=dlaor@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=mst@redhat.com \
--cc=mtosatti@redhat.com \
--cc=ohmura.kei@lab.ntt.co.jp \
--cc=psuriset@linux.vnet.ibm.com \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@linux.vnet.ibm.com \
--cc=tamura.yoshiaki@lab.ntt.co.jp \
--cc=vatsa@linux.vnet.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).