From: Maxim Levitsky <mlevitsk@redhat.com>
To: qemu-devel@nongnu.org
Cc: Kevin Wolf <kwolf@redhat.com>, Fam Zheng <fam@euphon.net>,
Max Reitz <mreitz@redhat.com>,
qemu-block@nongnu.org, John Ferlan <jferlan@redhat.com>
Subject: Re: [Qemu-devel] [PATCH v2 1/1] raw-posix.c - use max transfer length / max segement count only for SCSI passthrough
Date: Wed, 10 Jul 2019 16:43:00 +0300 [thread overview]
Message-ID: <0ab783dd5362725b086cf2087a5015b1af0735ad.camel@redhat.com> (raw)
In-Reply-To: <20190704124342.7753-2-mlevitsk@redhat.com>
On Thu, 2019-07-04 at 15:43 +0300, Maxim Levitsky wrote:
> Regular kernel block devices (/dev/sda*, /dev/nvme*, etc) don't have
> max segment size/max segment count hardware requirements exposed
> to the userspace, but rather the kernel block layer
> takes care to split the incoming requests that
> violate these requirements.
>
> Allowing the kernel to do the splitting allows qemu to avoid
> various overheads that arise otherwise from this.
>
> This is especially visible in nbd server,
> exposing as a raw file, a mostly empty qcow2 image over the net.
> In this case most of the reads by the remote user
> won't even hit the underlying kernel block device,
> and therefore most of the overhead will be in the
> nbd traffic which increases significantly with lower max transfer size.
>
> In addition to that even for local block device
> access the peformance improves a bit due to less
> traffic between qemu and the kernel when large
> transfer sizes are used (e.g for image conversion)
>
> More info can be found at:
> https://bugzilla.redhat.com/show_bug.cgi?id=1647104
>
> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> Reviewed-by: Eric Blake <eblake@redhat.com>
> ---
> block/file-posix.c | 54 ++++++++++++++++++++++++----------------------
> 1 file changed, 28 insertions(+), 26 deletions(-)
>
> diff --git a/block/file-posix.c b/block/file-posix.c
> index ab05b51a66..4479cc7ab4 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -1038,15 +1038,13 @@ static void raw_reopen_abort(BDRVReopenState *state)
> s->reopen_state = NULL;
> }
>
> -static int hdev_get_max_transfer_length(BlockDriverState *bs, int fd)
> +static int sg_get_max_transfer_length(int fd)
> {
> #ifdef BLKSECTGET
> int max_bytes = 0;
> - short max_sectors = 0;
> - if (bs->sg && ioctl(fd, BLKSECTGET, &max_bytes) == 0) {
> +
> + if (ioctl(fd, BLKSECTGET, &max_bytes) == 0) {
> return max_bytes;
> - } else if (!bs->sg && ioctl(fd, BLKSECTGET, &max_sectors) == 0) {
> - return max_sectors << BDRV_SECTOR_BITS;
> } else {
> return -errno;
> }
> @@ -1055,25 +1053,31 @@ static int hdev_get_max_transfer_length(BlockDriverState *bs, int fd)
> #endif
> }
>
> -static int hdev_get_max_segments(const struct stat *st)
> +static int sg_get_max_segments(int fd)
> {
> #ifdef CONFIG_LINUX
> char buf[32];
> const char *end;
> - char *sysfspath;
> + char *sysfspath = NULL;
> int ret;
> - int fd = -1;
> + int sysfd = -1;
> long max_segments;
> + struct stat st;
> +
> + if (fstat(fd, &st)) {
> + ret = -errno;
> + goto out;
> + }
>
> sysfspath = g_strdup_printf("/sys/dev/block/%u:%u/queue/max_segments",
> - major(st->st_rdev), minor(st->st_rdev));
> - fd = open(sysfspath, O_RDONLY);
> - if (fd == -1) {
> + major(st.st_rdev), minor(st.st_rdev));
> + sysfd = open(sysfspath, O_RDONLY);
> + if (sysfd == -1) {
> ret = -errno;
> goto out;
> }
> do {
> - ret = read(fd, buf, sizeof(buf) - 1);
> + ret = read(sysfd, buf, sizeof(buf) - 1);
> } while (ret == -1 && errno == EINTR);
> if (ret < 0) {
> ret = -errno;
> @@ -1090,8 +1094,8 @@ static int hdev_get_max_segments(const struct stat *st)
> }
>
> out:
> - if (fd != -1) {
> - close(fd);
> + if (sysfd != -1) {
> + close(sysfd);
> }
> g_free(sysfspath);
> return ret;
> @@ -1103,19 +1107,17 @@ out:
> static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
> {
> BDRVRawState *s = bs->opaque;
> - struct stat st;
>
> - if (!fstat(s->fd, &st)) {
> - if (S_ISBLK(st.st_mode) || S_ISCHR(st.st_mode)) {
> - int ret = hdev_get_max_transfer_length(bs, s->fd);
> - if (ret > 0 && ret <= BDRV_REQUEST_MAX_BYTES) {
> - bs->bl.max_transfer = pow2floor(ret);
> - }
> - ret = hdev_get_max_segments(&st);
> - if (ret > 0) {
> - bs->bl.max_transfer = MIN(bs->bl.max_transfer,
> - ret * getpagesize());
> - }
> + if (bs->sg) {
> + int ret = sg_get_max_transfer_length(s->fd);
> +
> + if (ret > 0 && ret <= BDRV_REQUEST_MAX_BYTES) {
> + bs->bl.max_transfer = pow2floor(ret);
> + }
> +
> + ret = sg_get_max_segments(s->fd);
> + if (ret > 0) {
> + bs->bl.max_transfer = MIN(bs->bl.max_transfer, ret * getpagesize());
> }
> }
>
Ping.
Best regards,
Maxim Levitsky
next prev parent reply other threads:[~2019-07-10 13:53 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-07-04 12:43 [Qemu-devel] [PATCH v2 0/1] Don't obey the kernel block device max transfer len / max segments for raw block devices Maxim Levitsky
2019-07-04 12:43 ` [Qemu-devel] [PATCH v2 1/1] raw-posix.c - use max transfer length / max segement count only for SCSI passthrough Maxim Levitsky
2019-07-10 13:43 ` Maxim Levitsky [this message]
2019-07-11 10:31 ` [Qemu-devel] [Qemu-block] [PATCH v2 0/1] Don't obey the kernel block device max transfer len / max segments for raw block devices Stefan Hajnoczi
2019-07-12 8:32 ` [Qemu-devel] " Pankaj Gupta
2019-07-12 9:20 ` Kevin Wolf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=0ab783dd5362725b086cf2087a5015b1af0735ad.camel@redhat.com \
--to=mlevitsk@redhat.com \
--cc=fam@euphon.net \
--cc=jferlan@redhat.com \
--cc=kwolf@redhat.com \
--cc=mreitz@redhat.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).