qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Kevin Wolf <kwolf@redhat.com>
To: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Cc: "qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	"qemu-block@nongnu.org" <qemu-block@nongnu.org>,
	"armbru@redhat.com" <armbru@redhat.com>,
	"eblake@redhat.com" <eblake@redhat.com>,
	"fam@euphon.net" <fam@euphon.net>,
	"stefanha@redhat.com" <stefanha@redhat.com>,
	"mreitz@redhat.com" <mreitz@redhat.com>,
	"pbonzini@redhat.com" <pbonzini@redhat.com>,
	Denis Lunev <den@virtuozzo.com>
Subject: Re: [Qemu-devel] [PATCH] block: don't probe zeroes in bs->file by default on block_status
Date: Thu, 24 Jan 2019 16:31:53 +0100	[thread overview]
Message-ID: <20190124153153.GI4601@localhost.localdomain> (raw)
In-Reply-To: <62aa7f86-4ac7-0eeb-9e9d-30cb6ff229de@virtuozzo.com>

Am 24.01.2019 um 15:36 hat Vladimir Sementsov-Ogievskiy geschrieben:
> 23.01.2019 19:33, Kevin Wolf wrote:
> > Am 23.01.2019 um 12:53 hat Vladimir Sementsov-Ogievskiy geschrieben:
> >> 22.01.2019 21:57, Kevin Wolf wrote:
> >>> Am 11.01.2019 um 12:40 hat Vladimir Sementsov-Ogievskiy geschrieben:
> >>>> 11.01.2019 13:41, Kevin Wolf wrote:
> >>>>> Am 10.01.2019 um 14:20 hat Vladimir Sementsov-Ogievskiy geschrieben:
> >>>>>> drv_co_block_status digs bs->file for additional, more accurate search
> >>>>>> for hole inside region, reported as DATA by bs since 5daa74a6ebc.
> >>>>>>
> >>>>>> This accuracy is not free: assume we have qcow2 disk. Actually, qcow2
> >>>>>> knows, where are holes and where is data. But every block_status
> >>>>>> request calls lseek additionally. Assume a big disk, full of
> >>>>>> data, in any iterative copying block job (or img convert) we'll call
> >>>>>> lseek(HOLE) on every iteration, and each of these lseeks will have to
> >>>>>> iterate through all metadata up to the end of file. It's obviously
> >>>>>> ineffective behavior. And for many scenarios we don't need this lseek
> >>>>>> at all.
> >>>>>>
> >>>>>> So, let's "5daa74a6ebc" by default, leaving an option to return
> >>>>>> previous behavior, which is needed for scenarios with preallocated
> >>>>>> images.
> >>>>>>
> >>>>>> Add iotest illustrating new option semantics.
> >>>>>>
> >>>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> >>>>>
> >>>>> I still think that an option isn't a good solution and we should try use
> >>>>> some heuristics instead.
> >>>>
> >>>> Do you think that heuristics would be better than fair cache for lseek results?
> >>>
> >>> I just played a bit with this (qemu-img convert only), and how much
> >>> caching lseek() results helps depends completely on the image. As it
> >>> happened, my test image was the worst case where caching didn't buy us
> >>> much. Obviously, I can just as easily construct an image where it makes
> >>> a huge difference. I think that most real-world images should be able to
> >>> take good advantage of it, though, and it doesn't hurt, so maybe that's
> >>> a first thing that we can do in any case. It might not be the complete
> >>> solution, though.
> >>>
> >>> Let me explain my test images: The case where all of this actually
> >>> matters for qemu-img convert is fragmented qcow2 images. If your image
> >>> isn't fragmented, we don't do lseek() a lot anyway because a single
> >>> bdrv_block_status() call already gives you the information for the whole
> >>> image. So I constructed a fragmented image, by writing to it backwards:
> >>>
> >>> ./qemu-img create -f qcow2 /tmp/test.qcow2 1G
> >>> for i in $(seq 16384 -1 0); do
> >>>       echo "write $((i * 65536)) 64k"
> >>> done | ./qemu-io /tmp/test.qcow2
> >>>
> >>> It's not really surprising that caching the lseek() results doesn't help
> >>> much there as we're moving backwards and lseek() only returns results
> >>> about the things after the current position, not before the current
> >>> position. So this is probably the worst case.
> >>>
> >>> So I constructed a second image, which is fragmented, too, but starts at
> >>> the beginning of the image file:
> >>>
> >>> ./qemu-img create -f qcow2 /tmp/test_forward.qcow2 1G
> >>> for i in $(seq 0 2 16384); do
> >>>       echo "write $((i * 65536)) 64k"
> >>> done | ./qemu-io /tmp/test_forward.qcow2
> >>> for i in $(seq 1 2 16384); do
> >>>       echo "write $((i * 65536)) 64k"
> >>> done | ./qemu-io /tmp/test_forward.qcow2
> >>>
> >>> Here caching makes a huge difference:
> >>>
> >>>       time ./qemu-img convert -p -n $IMG null-co://
> >>>
> >>>                           uncached        cached
> >>>       test.qcow2             ~145s         ~70s
> >>>       test_forward.qcow2     ~110s        ~0.2s
> >>
> >> Unsure about your results, at least 0.2s means, that we benefit from
> >> cached read, not lseek.
> > 
> > Yes, all reads are from the kernel page cache, this is on tmpfs.
> > 
> > I chose tmpfs for two reasons: I wanted to get expensive I/O out of the
> > way so that the lseek() performance is even visible; and tmpfs was
> > reported to perform especially bad for SEEK_DATA/HOLE (which my results
> > confirm). So yes, this setup really makes the lseek() calls stand out
> > much more than in the common case (which makes sense when you want to
> > fix the overhead introduced by them).
> 
> Ok, missed this. On the other hand tmpfs is not a real production case..

Yes, I fully agree. But it was a simple case where I knew there is a
problem.

I also have a bug report on XFS with an image that is very fragmented on
the file system level. But I don't know how to produce such a file to
run benchmarks on it.

Kevin

  reply	other threads:[~2019-01-24 15:43 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-10 13:20 [Qemu-devel] [PATCH] block: don't probe zeroes in bs->file by default on block_status Vladimir Sementsov-Ogievskiy
2019-01-10 20:51 ` Eric Blake
2019-01-11  7:54   ` Vladimir Sementsov-Ogievskiy
2019-01-11 10:13     ` Vladimir Sementsov-Ogievskiy
2019-01-11 16:02     ` Eric Blake
2019-01-11 16:05       ` Eric Blake
2019-01-11 16:22       ` Vladimir Sementsov-Ogievskiy
2019-01-11 17:12         ` Eric Blake
2019-01-11 10:41 ` Kevin Wolf
2019-01-11 11:40   ` Vladimir Sementsov-Ogievskiy
2019-01-11 12:21     ` Kevin Wolf
2019-01-11 12:59       ` Vladimir Sementsov-Ogievskiy
2019-01-11 13:15         ` Kevin Wolf
2019-01-11 16:09           ` Vladimir Sementsov-Ogievskiy
2019-01-11 17:04             ` Eric Blake
2019-01-11 17:27               ` Vladimir Sementsov-Ogievskiy
2019-01-22 18:57     ` Kevin Wolf
2019-01-23 11:53       ` Vladimir Sementsov-Ogievskiy
2019-01-23 16:33         ` Kevin Wolf
2019-01-24 14:36           ` Vladimir Sementsov-Ogievskiy
2019-01-24 15:31             ` Kevin Wolf [this message]
2019-01-24 15:47               ` Vladimir Sementsov-Ogievskiy
2019-01-23 12:04       ` Vladimir Sementsov-Ogievskiy
2019-01-24 14:37         ` Vladimir Sementsov-Ogievskiy
2019-01-24 15:39           ` Kevin Wolf
2019-01-24 15:49             ` Eric Blake
2019-01-24 15:53             ` Vladimir Sementsov-Ogievskiy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190124153153.GI4601@localhost.localdomain \
    --to=kwolf@redhat.com \
    --cc=armbru@redhat.com \
    --cc=den@virtuozzo.com \
    --cc=eblake@redhat.com \
    --cc=fam@euphon.net \
    --cc=mreitz@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    --cc=vsementsov@virtuozzo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).