From: Kevin Wolf <kwolf@redhat.com>
To: Yoshiaki Tamura <tamura.yoshiaki@lab.ntt.co.jp>
Cc: aliguori@us.ibm.com, dlaor@redhat.com, ananth@in.ibm.com,
kvm@vger.kernel.org, mst@redhat.com, mtosatti@redhat.com,
qemu-devel@nongnu.org, vatsa@linux.vnet.ibm.com,
blauwirbel@gmail.com, ohmura.kei@lab.ntt.co.jp, avi@redhat.com,
psuriset@linux.vnet.ibm.com, stefanha@linux.vnet.ibm.com
Subject: Re: [Qemu-devel] [PATCH 09/19] Introduce event-tap.
Date: Thu, 20 Jan 2011 10:15:20 +0100 [thread overview]
Message-ID: <4D37FD28.8000402@redhat.com> (raw)
In-Reply-To: <AANLkTik_oHT71ggEiUN+vfBJw_CczzXW7LsyRqogq5pF@mail.gmail.com>
Am 20.01.2011 06:19, schrieb Yoshiaki Tamura:
>>>>> + return;
>>>>> + }
>>>>> +
>>>>> + bdrv_aio_writev(bs, blk_req->reqs[0].sector, blk_req->reqs[0].qiov,
>>>>> + blk_req->reqs[0].nb_sectors, blk_req->reqs[0].cb,
>>>>> + blk_req->reqs[0].opaque);
>>>>
>>>> Same here.
>>>>
>>>>> + bdrv_flush(bs);
>>>>
>>>> This looks really strange. What is this supposed to do?
>>>>
>>>> One point is that you write it immediately after bdrv_aio_write, so you
>>>> get an fsync for which you don't know if it includes the current write
>>>> request or if it doesn't. Which data do you want to get flushed to the disk?
>>>
>>> I was expecting to flush the aio request that was just initiated.
>>> Am I misunderstanding the function?
>>
>> Seems so. The function names don't use really clear terminology either,
>> so you're not the first one to fall in this trap. Basically we have:
>>
>> * qemu_aio_flush() waits for all AIO requests to complete. I think you
>> wanted to have exactly this, but only for a single block device. Such a
>> function doesn't exist yet.
>>
>> * bdrv_flush() makes sure that all successfully completed requests are
>> written to disk (by calling fsync)
>>
>> * bdrv_aio_flush() is the asynchronous version of bdrv_flush, i.e. run
>> the fsync in the thread pool
>
> Then what I wanted to do is, call qemu_aio_flush first, then
> bdrv_flush. It should be like live migration.
Okay, that makes sense. :-)
>>>> The other thing is that you introduce a bdrv_flush for each request,
>>>> basically forcing everyone to something very similar to writethrough
>>>> mode. I'm sure this will have a big impact on performance.
>>>
>>> The reason is to avoid inversion of queued requests. Although
>>> processing one-by-one is heavy, wouldn't having requests flushed
>>> to disk out of order break the disk image?
>>
>> No, that's fine. If a guest issues two requests at the same time, they
>> may complete in any order. You just need to make sure that you don't
>> call the completion callback before the request really has completed.
>
> We need to flush requests, meaning aio and fsync, before sending
> the final state of the guests, to make sure we can switch to the
> secondary safely.
In theory I think you could just re-submit the requests on the secondary
if they had not completed yet.
But you're right, let's keep things simple for the start.
>> I'm just starting to wonder if the guest won't timeout the requests if
>> they are queued for too long. Even more, with IDE, it can only handle
>> one request at a time, so not completing requests doesn't sound like a
>> good idea at all. In what intervals is the event-tap queue flushed?
>
> The requests are flushed once each transaction completes. So
> it's not with specific intervals.
Right. So when is a transaction completed? This is the time that a
single request will take.
>> On the other hand, if you complete before actually writing out, you
>> don't get timeouts, but you signal success to the guest when the request
>> could still fail. What would you do in this case? With a writeback cache
>> mode we're fine, we can just fail the next flush (until then nothing is
>> guaranteed to be on disk and order doesn't matter either), but with
>> cache=writethrough we're in serious trouble.
>>
>> Have you thought about this problem? Maybe we end up having to flush the
>> event-tap queue for each single write in writethrough mode.
>
> Yes, and that's what I'm trying to do at this point.
Oh, I must have missed that code. Which patch/function should I look at?
> I know that
> performance matters a lot, but sacrificing reliability over
> performance now isn't a good idea. I first want to lay the
> ground, and then focus on optimization. Note that without dirty
> bitmap optimization, Kemari suffers a lot in sending rams.
> Anthony and I discussed to take this approach at KVM Forum.
I agree, starting simple makes sense.
Kevin
next prev parent reply other threads:[~2011-01-20 9:14 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-01-19 5:44 [Qemu-devel] [PATCH 00/19] Kemari for KVM v0.2.6 Yoshiaki Tamura
2011-01-19 5:44 ` [Qemu-devel] [PATCH 01/19] Make QEMUFile buf expandable, and introduce qemu_realloc_buffer() and qemu_clear_buffer() Yoshiaki Tamura
2011-01-19 5:44 ` [Qemu-devel] [PATCH 02/19] Introduce read() to FdMigrationState Yoshiaki Tamura
2011-01-19 5:44 ` [Qemu-devel] [PATCH 03/19] Introduce skip_header parameter to qemu_loadvm_state() Yoshiaki Tamura
2011-01-19 5:44 ` [Qemu-devel] [PATCH 04/19] qemu-char: export socket_set_nodelay() Yoshiaki Tamura
2011-01-19 5:44 ` [Qemu-devel] [PATCH 05/19] vl.c: add deleted flag for deleting the handler Yoshiaki Tamura
2011-01-19 5:44 ` [Qemu-devel] [PATCH 06/19] virtio: decrement last_avail_idx with inuse before saving Yoshiaki Tamura
2011-01-19 5:44 ` [Qemu-devel] [PATCH 07/19] Introduce fault tolerant VM transaction QEMUFile and ft_mode Yoshiaki Tamura
2011-01-19 5:44 ` [Qemu-devel] [PATCH 08/19] savevm: introduce util functions to control ft_trans_file from savevm layer Yoshiaki Tamura
2011-01-19 5:44 ` [Qemu-devel] [PATCH 09/19] Introduce event-tap Yoshiaki Tamura
2011-01-19 9:38 ` Kevin Wolf
2011-01-19 13:04 ` Yoshiaki Tamura
2011-01-19 13:50 ` Kevin Wolf
2011-01-20 5:19 ` Yoshiaki Tamura
2011-01-20 9:15 ` Kevin Wolf [this message]
2011-01-20 10:39 ` Yoshiaki Tamura
2011-01-20 11:46 ` Kevin Wolf
2011-01-20 13:50 ` Yoshiaki Tamura
2011-01-20 14:21 ` Kevin Wolf
2011-01-20 15:48 ` Yoshiaki Tamura
2011-01-19 5:44 ` [Qemu-devel] [PATCH 10/19] Call init handler of event-tap at main() in vl.c Yoshiaki Tamura
2011-01-19 5:44 ` [Qemu-devel] [PATCH 11/19] ioport: insert event_tap_ioport() to ioport_write() Yoshiaki Tamura
2011-01-19 5:44 ` [Qemu-devel] [PATCH 12/19] Insert event_tap_mmio() to cpu_physical_memory_rw() in exec.c Yoshiaki Tamura
2011-01-19 5:44 ` [Qemu-devel] [PATCH 13/19] net: insert event-tap to qemu_send_packet() and qemu_sendv_packet_async() Yoshiaki Tamura
2011-01-19 5:44 ` [Qemu-devel] [PATCH 14/19] block: insert event-tap to bdrv_aio_writev() and bdrv_aio_flush() Yoshiaki Tamura
2011-01-19 9:05 ` Kevin Wolf
2011-01-19 12:06 ` Yoshiaki Tamura
2011-01-19 9:47 ` Kevin Wolf
2011-01-19 13:16 ` Yoshiaki Tamura
2011-01-19 14:08 ` Kevin Wolf
2011-01-20 5:01 ` Yoshiaki Tamura
2011-01-19 5:45 ` [Qemu-devel] [PATCH 15/19] savevm: introduce qemu_savevm_trans_{begin, commit} Yoshiaki Tamura
2011-01-19 5:45 ` [Qemu-devel] [PATCH 16/19] migration: introduce migrate_ft_trans_{put, get}_ready(), and modify migrate_fd_put_ready() when ft_mode is on Yoshiaki Tamura
2011-01-19 5:45 ` [Qemu-devel] [PATCH 17/19] migration-tcp: modify tcp_accept_incoming_migration() to handle ft_mode, and add a hack not to close fd when ft_mode is enabled Yoshiaki Tamura
2011-01-19 5:45 ` [Qemu-devel] [PATCH 18/19] Introduce -k option to enable FT migration mode (Kemari) Yoshiaki Tamura
2011-01-19 5:45 ` [Qemu-devel] [PATCH 19/19] migration: add a parser to accept FT migration incoming mode Yoshiaki Tamura
-- strict thread matches above, loose matches on Subject: below --
2011-02-08 11:01 [Qemu-devel] [PATCH 00/19] Kemari for KVM v0.2.9 Yoshiaki Tamura
2011-02-08 11:01 ` [Qemu-devel] [PATCH 09/19] Introduce event-tap Yoshiaki Tamura
2011-01-28 7:21 [Qemu-devel] [PATCH 00/19] Kemari for KVM v0.2.8 Yoshiaki Tamura
2011-01-28 7:21 ` [Qemu-devel] [PATCH 09/19] Introduce event-tap Yoshiaki Tamura
2011-01-26 9:41 [Qemu-devel] [PATCH 00/19] Kemari for KVM v0.2.7 Yoshiaki Tamura
2011-01-26 9:42 ` [Qemu-devel] [PATCH 09/19] Introduce event-tap Yoshiaki Tamura
2011-01-14 17:33 [Qemu-devel] [PATCH 00/19] Kemari for KVM v0.2.5 Yoshiaki Tamura
2011-01-14 17:33 ` [Qemu-devel] [PATCH 09/19] Introduce event-tap Yoshiaki Tamura
2011-01-13 17:15 [Qemu-devel] [PATCH 00/19] Kemari for KVM v0.2.4 Yoshiaki Tamura
2011-01-13 17:15 ` [Qemu-devel] [PATCH 09/19] Introduce event-tap Yoshiaki Tamura
2011-01-11 10:59 [Qemu-devel] [PATCH 00/19] Kemari for KVM v0.2.3 Yoshiaki Tamura
2011-01-11 10:59 ` [Qemu-devel] [PATCH 09/19] Introduce event-tap Yoshiaki Tamura
2010-12-27 8:25 [Qemu-devel] [PATCH 00/19] Kemari for KVM v0.2.2 Yoshiaki Tamura
2010-12-27 8:25 ` [Qemu-devel] [PATCH 09/19] Introduce event-tap Yoshiaki Tamura
2010-12-24 3:18 [Qemu-devel] [PATCH 00/19] Kemari for KVM v0.2.1 Yoshiaki Tamura
2010-12-24 3:18 ` [Qemu-devel] [PATCH 09/19] Introduce event-tap Yoshiaki Tamura
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4D37FD28.8000402@redhat.com \
--to=kwolf@redhat.com \
--cc=aliguori@us.ibm.com \
--cc=ananth@in.ibm.com \
--cc=avi@redhat.com \
--cc=blauwirbel@gmail.com \
--cc=dlaor@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=mst@redhat.com \
--cc=mtosatti@redhat.com \
--cc=ohmura.kei@lab.ntt.co.jp \
--cc=psuriset@linux.vnet.ibm.com \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@linux.vnet.ibm.com \
--cc=tamura.yoshiaki@lab.ntt.co.jp \
--cc=vatsa@linux.vnet.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).