From: Michael Tokarev <mjt@tls.msk.ru>
To: Jan Kara <jack@suse.cz>
Cc: qemu-devel <qemu-devel@nongnu.org>,
linux-fsdevel <linux-fsdevel@vger.kernel.org>
Subject: Re: slow ext4 O_SYNC writes (why qemu qcow2 is so slow on ext4 vs ext3)
Date: Tue, 20 Jul 2010 17:41:33 +0300 [thread overview]
Message-ID: <4C45B59D.8040207@msgid.tls.msk.ru> (raw)
In-Reply-To: <20100720134646.GC3657@quack.suse.cz>
20.07.2010 16:46, Jan Kara wrote:
> Hi,
>
> On Fri 02-07-10 16:46:28, Michael Tokarev wrote:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> I noticed that qcow2 images, esp. fresh ones (so that they
>> receive lots of metadata updates) are very slow on my
>> machine. And on IRC (#kvm), Sheldon Hearn found that on
>> ext3, it is fast again.
>>
>> So I tested different combinations for a bit, and observed
>> the following:
>>
>> for fresh qcow2 file, with default qemu cache settings,
>> copying kernel source is about 10 times slower on ext4
>> than on ext3. Second copy (rewrite) is significantly
>> faster in both cases (expectable), but still ~20% slower
>> on ext4 than on ext3.
>>
>> Normal cache mode in qemu is writethrough, which translates
>> to O_SYNC file open mode.
>>
>> With cache=none, which translates to O_DIRECT, metadata-
>> intensive writes (fresh qcow) are about as slow as on
>> ext4 with O_SYNC, and rewrite is expectedly faster, but
>> now there's _no_ difference in speed between ext3 and ext4.
>>
>> I did a series of straces of the writer processes, -- time
>> spent in pwrite() syscalls is significantly larger for
>> ext4 with O_SYNC than with ext3 with O_SYNC, the diff is
>> about 50 times.
>>
>> Also, with slower I/O in case of ext4, qemu-kvm starts more
>> I/O threads, which, as it seems, slows whole thing down even
>> further - I changed max_threads from default 64 to 16, and
>> the speed improved slightly. Here, the diff. is again quite
>> significant: on ext3 qemu spawns only 8 threads, while on
>> ext4 all 64 I/O threads are spawned almost immediately.
>>
>> So I've two questions:
>>
>> 1. Why ext4 O_SYNC is too slow compared with ext3 O_SYNC?
>> This is observed on 2.6.32 and 2.6.34 kernels, barriers
>> or data={writeback|ordered} had no difference. I tested
>> whole thing on a partition on a single drive, sheldonh
>> used ext[34]fs on top of lvm on a raid1 volume.
> Do I get it right, that you have ext3/4 which carries fs images used by
> KVM? What you describe is strange. Up to this moment it sounded to me like
> a difference in barrier settings on the host but you seem to have tried
> that. Just stabbing in the dark - could you try nodelalloc mount option
> of ext4?
Yes, exactly, a guest filesystem image stored on ext3 or
ext4. And yes, I suspected barriers too, but immediately
ruled that out, since barrier or no barrier does not matter
in this test.
I'll try nodelalloc, but I'm not sure when: right now I'm at
vacation, typing from a hotel, and my home machine whith all
the guest images and the like is turned off and - for some
reason - I can't wake it up over ethernet, it seemingly ignores
WOL packets. Too bad I don't have any guest image here on my
notebook.
>> 2. The number of threads spawned for I/O... this is a good
>> question, how to find an adequate cap. Different hw has
>> different capabilities, and we may have more users doing
>> I/O at the same time...
> Maybe you could measure your total throughput over some period,
> try increasing number of threads in the next period and if it
> helps significantly, use larger number, otherwise go back to a
> smaller number?
Well, this is, again, a good question -- it's how qemu works right
now, spawning up to 64 I/O threads for all I/O requiests guests
submits. The slower the I/O, the more threads can be spawned.
Working that part out is a separate, difficult job.
The main question here is why ext4 is so slow for O_[D]SYNC writes.
Besides, quite similar topic were discussed meanwhile, in a different
thread titled "BTRFS: Unbelievably slow with kvm/qemu" -- see f.e.
http://marc.info/?t=127891236700003&r=1&w=2 . In particular, this
message http://marc.info/?l=linux-kernel&m=127913696420974 shows
a comparison table for a few filesystems and qemu/kvm usage, but on
raw files instead of qcow.
Different qemu/kvm guest fs image options are (partial list):
raw disk image in a file on host. Either pre-allocated or
(initially) sparse. The pre-allocated case should - in
theory - work equally on all filesystems. While sparse
case should differ per filesystem, depending on how different
filesystems allocate data.
qcow[2] image in a file on host. This one is never sparse,
but unlike raw it also contains some qemu-specific metadata,
like which blocks are allocated and in which place, sorta
like lvm. Initially it is created empty (with only a header),
and when guest perform writes, new blocks are allocated and
metadata gets updated. This requires some more writes than
the guest performs, and quite a few syncs (with O_SYNC they're
automatic).
Thanks!
/mjt
next prev parent reply other threads:[~2010-07-20 14:51 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-07-02 12:46 slow ext4 O_SYNC writes (why qemu qcow2 is so slow on ext4 vs ext3) Michael Tokarev
2010-07-20 13:46 ` Jan Kara
2010-07-20 14:41 ` Michael Tokarev [this message]
2010-07-20 15:59 ` Jan Kara
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4C45B59D.8040207@msgid.tls.msk.ru \
--to=mjt@tls.msk.ru \
--cc=jack@suse.cz \
--cc=linux-fsdevel@vger.kernel.org \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).