All of lore.kernel.org
 help / color / mirror / Atom feed
From: Kevin Wolf <kwolf@redhat.com>
To: Fam Zheng <famz@redhat.com>
Cc: qemu-devel@nongnu.org, qemu-block@nongnu.org,
	Max Reitz <mreitz@redhat.com>,
	pbonzini@redhat.com, eblake@redhat.com
Subject: Re: [Qemu-devel] [PATCH for-2.9 v3] file-posix: Consider max_segments for BlockLimits.max_transfer
Date: Wed, 8 Mar 2017 13:34:10 +0100	[thread overview]
Message-ID: <20170308123410.GD5211@noname.redhat.com> (raw)
In-Reply-To: <20170308120814.29967-1-famz@redhat.com>

Am 08.03.2017 um 13:08 hat Fam Zheng geschrieben:
> BlockLimits.max_transfer can be too high without this fix, guest will
> encounter I/O error or even get paused with werror=stop or rerror=stop. The
> cause is explained below.
> 
> Linux has a separate limit, /sys/block/.../queue/max_segments, which in
> the worst case can be more restrictive than the BLKSECTGET which we
> already consider (note that they are two different things). So, the
> failure scenario before this patch is:
> 
> 1) host device has max_sectors_kb = 4096 and max_segments = 64;
> 2) guest learns max_sectors_kb limit from QEMU, but doesn't know
>    max_segments;
> 3) guest issues e.g. a 512KB request thinking it's okay, but actually
>    it's not, because it will be passed through to host device as an
>    SG_IO req that has niov > 64;
> 4) host kernel doesn't like the segmenting of the request, and returns
>    -EINVAL;
> 
> This patch checks the max_segments sysfs entry for the host device and
> calculates a "conservative" bytes limit using the page size, which is
> then merged into the existing max_transfer limit. Guest will discover
> this from the usual virtual block device interfaces. (In the case of
> scsi-generic, it will be done in the INQUIRY reply interception in
> device model.)
> 
> The other possibility is to actually propagate it as a separate limit,
> but it's not better. On the one hand, there is a big complication: the
> limit is per-LUN in QEMU PoV (because we can attach LUNs from different
> host HBAs to the same virtio-scsi bus), but the channel to communicate
> it in a per-LUN manner is missing down the stack; on the other hand,
> two limits versus one doesn't change much about the valid size of I/O
> (because guest has no control over host segmenting).
> 
> Also, the idea to fall back to bounce buffering in QEMU, upon -EINVAL,
> was explored. Unfortunately there is no neat way to ensure the bounce
> buffer is less segmented (in terms of DMA addr) than the guest buffer.
> 
> Practically, this bug is not very common. It is only reported on a
> Emulex (lpfc), so it's okay to get it fixed in the easier way.
> 
> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
> Signed-off-by: Fam Zheng <famz@redhat.com>

Thanks, applied to the block branch.

Kevin

      reply	other threads:[~2017-03-08 12:34 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-08 12:08 [Qemu-devel] [PATCH for-2.9 v3] file-posix: Consider max_segments for BlockLimits.max_transfer Fam Zheng
2017-03-08 12:34 ` Kevin Wolf [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170308123410.GD5211@noname.redhat.com \
    --to=kwolf@redhat.com \
    --cc=eblake@redhat.com \
    --cc=famz@redhat.com \
    --cc=mreitz@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.