From: Kevin Wolf <kwolf@redhat.com>
To: qemu-block@nongnu.org
Cc: kwolf@redhat.com, qemu-devel@nongnu.org
Subject: [Qemu-devel] [PULL 02/12] file-posix: Consider max_segments for BlockLimits.max_transfer
Date: Mon, 13 Mar 2017 15:54:58 +0100 [thread overview]
Message-ID: <1489416908-3771-3-git-send-email-kwolf@redhat.com> (raw)
In-Reply-To: <1489416908-3771-1-git-send-email-kwolf@redhat.com>
From: Fam Zheng <famz@redhat.com>
BlockLimits.max_transfer can be too high without this fix, guest will
encounter I/O error or even get paused with werror=stop or rerror=stop. The
cause is explained below.
Linux has a separate limit, /sys/block/.../queue/max_segments, which in
the worst case can be more restrictive than the BLKSECTGET which we
already consider (note that they are two different things). So, the
failure scenario before this patch is:
1) host device has max_sectors_kb = 4096 and max_segments = 64;
2) guest learns max_sectors_kb limit from QEMU, but doesn't know
max_segments;
3) guest issues e.g. a 512KB request thinking it's okay, but actually
it's not, because it will be passed through to host device as an
SG_IO req that has niov > 64;
4) host kernel doesn't like the segmenting of the request, and returns
-EINVAL;
This patch checks the max_segments sysfs entry for the host device and
calculates a "conservative" bytes limit using the page size, which is
then merged into the existing max_transfer limit. Guest will discover
this from the usual virtual block device interfaces. (In the case of
scsi-generic, it will be done in the INQUIRY reply interception in
device model.)
The other possibility is to actually propagate it as a separate limit,
but it's not better. On the one hand, there is a big complication: the
limit is per-LUN in QEMU PoV (because we can attach LUNs from different
host HBAs to the same virtio-scsi bus), but the channel to communicate
it in a per-LUN manner is missing down the stack; on the other hand,
two limits versus one doesn't change much about the valid size of I/O
(because guest has no control over host segmenting).
Also, the idea to fall back to bounce buffering in QEMU, upon -EINVAL,
was explored. Unfortunately there is no neat way to ensure the bounce
buffer is less segmented (in terms of DMA addr) than the guest buffer.
Practically, this bug is not very common. It is only reported on a
Emulex (lpfc), so it's okay to get it fixed in the easier way.
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
block/file-posix.c | 47 +++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 47 insertions(+)
diff --git a/block/file-posix.c b/block/file-posix.c
index 4de1abd..c4c0663 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -668,6 +668,48 @@ static int hdev_get_max_transfer_length(BlockDriverState *bs, int fd)
#endif
}
+static int hdev_get_max_segments(const struct stat *st)
+{
+#ifdef CONFIG_LINUX
+ char buf[32];
+ const char *end;
+ char *sysfspath;
+ int ret;
+ int fd = -1;
+ long max_segments;
+
+ sysfspath = g_strdup_printf("/sys/dev/block/%u:%u/queue/max_segments",
+ major(st->st_rdev), minor(st->st_rdev));
+ fd = open(sysfspath, O_RDONLY);
+ if (fd == -1) {
+ ret = -errno;
+ goto out;
+ }
+ do {
+ ret = read(fd, buf, sizeof(buf));
+ } while (ret == -1 && errno == EINTR);
+ if (ret < 0) {
+ ret = -errno;
+ goto out;
+ } else if (ret == 0) {
+ ret = -EIO;
+ goto out;
+ }
+ buf[ret] = 0;
+ /* The file is ended with '\n', pass 'end' to accept that. */
+ ret = qemu_strtol(buf, &end, 10, &max_segments);
+ if (ret == 0 && end && *end == '\n') {
+ ret = max_segments;
+ }
+
+out:
+ g_free(sysfspath);
+ return ret;
+#else
+ return -ENOTSUP;
+#endif
+}
+
static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
{
BDRVRawState *s = bs->opaque;
@@ -679,6 +721,11 @@ static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
if (ret > 0 && ret <= BDRV_REQUEST_MAX_BYTES) {
bs->bl.max_transfer = pow2floor(ret);
}
+ ret = hdev_get_max_segments(&st);
+ if (ret > 0) {
+ bs->bl.max_transfer = MIN(bs->bl.max_transfer,
+ ret * getpagesize());
+ }
}
}
--
1.8.3.1
next prev parent reply other threads:[~2017-03-13 14:55 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-03-13 14:54 [Qemu-devel] [PULL 00/12] Block layer fixes for 2.9.0-rc1 Kevin Wolf
2017-03-13 14:54 ` [Qemu-devel] [PULL 01/12] backup: allow target without .bdrv_get_info Kevin Wolf
2017-03-13 14:54 ` Kevin Wolf [this message]
2017-03-13 14:54 ` [Qemu-devel] [PULL 03/12] block: Drop unmaintained 'archipelago' driver Kevin Wolf
2017-03-13 14:55 ` [Qemu-devel] [PULL 04/12] backup: React to bdrv_is_allocated() errors Kevin Wolf
2017-03-13 14:55 ` [Qemu-devel] [PULL 05/12] vvfat: " Kevin Wolf
2017-03-13 14:55 ` [Qemu-devel] [PULL 06/12] migration: Document handling of " Kevin Wolf
2017-03-13 14:55 ` [Qemu-devel] [PULL 07/12] block: Remove check_new_perm from bdrv_replace_child() Kevin Wolf
2017-03-13 14:55 ` [Qemu-devel] [PULL 08/12] block: Request block status from *file for BDRV_BLOCK_RAW Kevin Wolf
2017-03-13 14:55 ` [Qemu-devel] [PULL 09/12] commit: Implement bdrv_commit_top.bdrv_co_get_block_status Kevin Wolf
2017-03-13 14:55 ` [Qemu-devel] [PULL 10/12] block: Refresh filename after changing backing file Kevin Wolf
2017-03-13 14:55 ` [Qemu-devel] [PULL 11/12] mirror: Implement .bdrv_refresh_filename Kevin Wolf
2017-03-13 14:55 ` [Qemu-devel] [PULL 12/12] commit: " Kevin Wolf
2017-03-13 18:05 ` [Qemu-devel] [PULL 00/12] Block layer fixes for 2.9.0-rc1 Peter Maydell
2017-03-14 9:29 ` Kevin Wolf
2017-03-14 9:51 ` Peter Maydell
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1489416908-3771-3-git-send-email-kwolf@redhat.com \
--to=kwolf@redhat.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).