From: Nir Soffer <nirsof@gmail.com>
To: qemu-devel@nongnu.org
Cc: Kevin Wolf <kwolf@redhat.com>,
qemu-block@nongnu.org, Markus Armbruster <armbru@redhat.com>,
Max Reitz <mreitz@redhat.com>, Nir Soffer <nsoffer@redhat.com>
Subject: [PATCH 2/2] block: file-posix: Replace posix_fallocate with fallocate
Date: Mon, 31 Aug 2020 17:01:27 +0300 [thread overview]
Message-ID: <20200831140127.657134-3-nsoffer@redhat.com> (raw)
In-Reply-To: <20200831140127.657134-1-nsoffer@redhat.com>
If fallocate() is not supported, posix_fallocate() falls back to
inefficient allocation, writing one byte for every 4k bytes[1]. This is
very slow compared with writing zeros. In oVirt we measured ~400%
improvement in allocation time when replacing posix_fallocate() with
manually writing zeroes[2].
We also know that posix_fallocated() does not work well when using OFD
locks[3]. We don't know the reason yet for this issue yet.
Change preallocate_falloc() to use fallocate() instead of
posix_falloate(), and fall back to full preallocation if not supported.
Here are quick test results with this change.
Before (qemu-img-5.1.0-2.fc32.x86_64):
$ time qemu-img create -f raw -o preallocation=falloc /tmp/nfs3/test.raw 6g
Formatting '/tmp/nfs3/test.raw', fmt=raw size=6442450944 preallocation=falloc
real 0m42.100s
user 0m0.602s
sys 0m4.137s
NFS stats:
calls retrans authrefrsh write
1571583 0 1572205 1571321
After:
$ time ./qemu-img create -f raw -o preallocation=falloc /tmp/nfs3/test.raw 6g
Formatting '/tmp/nfs3/test.raw', fmt=raw size=6442450944 preallocation=falloc
real 0m15.551s
user 0m0.070s
sys 0m2.623s
NFS stats:
calls retrans authrefrsh write
24620 0 24624 24567
[1] https://code.woboq.org/userspace/glibc/sysdeps/posix/posix_fallocate.c.html#96
[2] https://bugzilla.redhat.com/1850267#c25
[3] https://bugzilla.redhat.com/1851097
Signed-off-by: Nir Soffer <nsoffer@redhat.com>
---
block/file-posix.c | 32 +++++++++-----------------
docs/system/qemu-block-drivers.rst.inc | 11 +++++----
docs/tools/qemu-img.rst | 11 +++++----
qapi/block-core.json | 4 ++--
4 files changed, 25 insertions(+), 33 deletions(-)
diff --git a/block/file-posix.c b/block/file-posix.c
index 341ffb1cb4..eac3c0b412 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -1835,36 +1835,24 @@ static int allocate_first_block(int fd, size_t max_size)
static int preallocate_falloc(int fd, int64_t current_length, int64_t offset,
Error **errp)
{
-#ifdef CONFIG_POSIX_FALLOCATE
+#ifdef CONFIG_FALLOCATE
int result;
if (offset == current_length)
return 0;
- /*
- * Truncating before posix_fallocate() makes it about twice slower on
- * file systems that do not support fallocate(), trying to check if a
- * block is allocated before allocating it, so don't do that here.
- */
-
- result = -posix_fallocate(fd, current_length,
- offset - current_length);
+ result = do_fallocate(fd, 0, current_length, offset - current_length);
if (result != 0) {
- /* posix_fallocate() doesn't set errno. */
- error_setg_errno(errp, -result,
- "Could not preallocate new data");
+ error_setg_errno(errp, -result, "Could not preallocate new data");
return result;
}
if (current_length == 0) {
/*
- * posix_fallocate() uses fallocate() if the filesystem supports
- * it, or fallback to manually writing zeroes. If fallocate()
- * was used, unaligned reads from the fallocated area in
- * raw_probe_alignment() will succeed, hence we need to allocate
- * the first block.
+ * Unaligned reads from the fallocated area in raw_probe_alignment()
+ * will succeed, hence we need to allocate the first block.
*
- * Optimize future alignment probing; ignore failures.
+ * Optimizes future alignment probing; ignore failures.
*/
allocate_first_block(fd, offset);
}
@@ -1973,10 +1961,12 @@ static int handle_aiocb_truncate(void *opaque)
}
switch (prealloc) {
-#ifdef CONFIG_POSIX_FALLOCATE
+#ifdef CONFIG_FALLOCATE
case PREALLOC_MODE_FALLOC:
result = preallocate_falloc(fd, current_length, offset, errp);
- goto out;
+ if (result != -ENOTSUP)
+ goto out;
+ /* If fallocate() is not supported, fallback to full preallocation. */
#endif
case PREALLOC_MODE_FULL:
result = preallocate_full(fd, current_length, offset, errp);
@@ -3080,7 +3070,7 @@ static QemuOptsList raw_create_opts = {
.name = BLOCK_OPT_PREALLOC,
.type = QEMU_OPT_STRING,
.help = "Preallocation mode (allowed values: off"
-#ifdef CONFIG_POSIX_FALLOCATE
+#ifdef CONFIG_FALLOCATE
", falloc"
#endif
", full)"
diff --git a/docs/system/qemu-block-drivers.rst.inc b/docs/system/qemu-block-drivers.rst.inc
index b052a6d14e..8e4acf397e 100644
--- a/docs/system/qemu-block-drivers.rst.inc
+++ b/docs/system/qemu-block-drivers.rst.inc
@@ -25,11 +25,12 @@ This section describes each format and the options that are supported for it.
.. program:: raw
.. option:: preallocation
- Preallocation mode (allowed values: ``off``, ``falloc``,
- ``full``). ``falloc`` mode preallocates space for image by
- calling ``posix_fallocate()``. ``full`` mode preallocates space
- for image by writing data to underlying storage. This data may or
- may not be zero, depending on the storage location.
+ Preallocation mode (allowed values: ``off``, ``falloc``, ``full``).
+ ``falloc`` mode preallocates space for image by calling
+ ``fallocate()``, and falling back to ``full` mode if not supported.
+ ``full`` mode preallocates space for image by writing data to
+ underlying storage. This data may or may not be zero, depending on
+ the storage location.
.. program:: image-formats
.. option:: qcow2
diff --git a/docs/tools/qemu-img.rst b/docs/tools/qemu-img.rst
index c35bd64822..a2089bd1b7 100644
--- a/docs/tools/qemu-img.rst
+++ b/docs/tools/qemu-img.rst
@@ -750,11 +750,12 @@ Supported image file formats:
Supported options:
``preallocation``
- Preallocation mode (allowed values: ``off``, ``falloc``,
- ``full``). ``falloc`` mode preallocates space for image by
- calling ``posix_fallocate()``. ``full`` mode preallocates space
- for image by writing data to underlying storage. This data may or
- may not be zero, depending on the storage location.
+ Preallocation mode (allowed values: ``off``, ``falloc``, ``full``).
+ ``falloc`` mode preallocates space for image by calling
+ ``fallocate()``, and falling back to ``full` mode if not supported.
+ ``full`` mode preallocates space for image by writing data to
+ underlying storage. This data may or may not be zero, depending on
+ the storage location.
``qcow2``
diff --git a/qapi/block-core.json b/qapi/block-core.json
index db08c58d78..681d79ec63 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -5021,8 +5021,8 @@
#
# @off: no preallocation
# @metadata: preallocate only for metadata
-# @falloc: like @full preallocation but allocate disk space by
-# posix_fallocate() rather than writing data.
+# @falloc: try to allocate disk space by fallocate(), and fallback to
+# @full preallocation if not supported.
# @full: preallocate all data by writing it to the device to ensure
# disk space is really available. This data may or may not be
# zero, depending on the image format and storage.
--
2.26.2
next prev parent reply other threads:[~2020-08-31 14:04 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-08-31 14:01 [PATCH 0/2] Replace posix_fallocate() with falloate() Nir Soffer
2020-08-31 14:01 ` [PATCH 1/2] block: file-posix: Extract preallocate helpers Nir Soffer
2020-09-01 10:24 ` Alberto Garcia
2020-09-01 10:26 ` Alberto Garcia
2020-09-01 10:47 ` Nir Soffer
2020-08-31 14:01 ` Nir Soffer [this message]
2020-09-01 15:51 ` [PATCH 2/2] block: file-posix: Replace posix_fallocate with fallocate Alberto Garcia
2020-09-14 17:32 ` Daniel P. Berrangé
2020-09-15 8:55 ` Nir Soffer
2020-08-31 15:55 ` [PATCH 0/2] Replace posix_fallocate() with falloate() no-reply
2020-09-14 17:19 ` Nir Soffer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200831140127.657134-3-nsoffer@redhat.com \
--to=nirsof@gmail.com \
--cc=armbru@redhat.com \
--cc=kwolf@redhat.com \
--cc=mreitz@redhat.com \
--cc=nsoffer@redhat.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).