From: Stefan Hajnoczi <stefanha@gmail.com>
To: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Cc: "Benoît Canet" <benoit.canet@irqsave.net>,
kwolf@redhat.com, qemu-devel@nongnu.org,
"Stefan Hajnoczi" <stefanha@redhat.com>
Subject: Re: [Qemu-devel] [PATCH] block: fix bdrv_exceed_iops_limits wait computation
Date: Thu, 21 Mar 2013 10:17:18 +0100 [thread overview]
Message-ID: <20130321091718.GA12555@stefanha-thinkpad.redhat.com> (raw)
In-Reply-To: <1363828709.32706.86.camel@f15>
On Thu, Mar 21, 2013 at 09:18:27AM +0800, Zhi Yong Wu wrote:
> On Wed, 2013-03-20 at 16:12 +0100, Stefan Hajnoczi wrote:
> > On Wed, Mar 20, 2013 at 03:56:33PM +0100, Benoît Canet wrote:
> > > > But I don't understand why bs->slice_time is modified instead of keeping
> > > > it constant at 100 ms:
> > > >
> > > > bs->slice_time = wait_time * BLOCK_IO_SLICE_TIME * 10;
> > > > bs->slice_end += bs->slice_time - 3 * BLOCK_IO_SLICE_TIME;
> > > > if (wait) {
> > > > *wait = wait_time * BLOCK_IO_SLICE_TIME * 10;
> > > > }
> > >
> > > In bdrv_exceed_bps_limits there is an equivalent to this with a comment.
> > >
> > > ---------
> > > /* When the I/O rate at runtime exceeds the limits,
> > > * bs->slice_end need to be extended in order that the current statistic
> > > * info can be kept until the timer fire, so it is increased and tuned
> > > * based on the result of experiment.
> > > */
> > > bs->slice_time = wait_time * BLOCK_IO_SLICE_TIME * 10;
> > > bs->slice_end += bs->slice_time - 3 * BLOCK_IO_SLICE_TIME;
> > > if (wait) {
> > > *wait = wait_time * BLOCK_IO_SLICE_TIME * 10;
> > > }
> > > ----------
> >
> > The comment explains why slice_end needs to be extended, but not why
> > bs->slice_time should be changed (except that it was tuned as the result
> > of an experiment).
> >
> > Zhi Yong: Do you remember a reason for modifying bs->slice_time?
> Stefan,
> In some case that the bare I/O speed is very fast on physical machine,
> when I/O speed is limited to be one lower value, I/O need to wait for
> one relative longer time(i.e. wait_time). You know, wait_time should be
> smaller than slice_time, if slice_time is constant, wait_time may not be
> its expected value, so the throttling function will not work well.
> For example, bare I/O speed is 100MB/s, I/O throttling speed is 1MB/s,
> slice_time is constant, and set to 50ms(a assumed value) or smaller, If
> current I/O can be throttled to 1MB/s, its wait_time is expected to
> 100ms(a assumed value), and is more bigger than current slice_time, I/O
> throttling function will not throttle actual I/O speed well. In the
> case, slice_time need to be adjusted to one more suitable value which
> depends on wait_time.
When an I/O request spans a slice:
1. It must wait until enough resources are available.
2. We extend the slice so that existing accounting is not lost.
But I don't understand what you say about a fast host. The bare metal
throughput does not affect the throttling calculation. The only values
that matter are bps limit and slice time:
In your example the slice time is 50ms and the current request needs
100ms. We need to extend slice_end to at least 100ms so that we can
account for this request.
Why should slice_time be changed?
> In some other case that the bare I/O speed is very slow and I/O
> throttling speed is fast, slice_time also need to be adjusted
> dynamically based on wait_time.
If the host is slower than the I/O limit there are two cases:
1. Requests are below I/O limit. We do not throttle, the host is slow
but that's okay.
2. Requests are above I/O limit. We throttle them but actually the host
will slow them down further to the bare metal speed. This is also fine.
Again, I don't see a nice to change slice_time.
BTW I discovered one thing that Linux blk-throttle does differently from
QEMU I/O throttling: we do not trim completed slices. I think trimming
avoids accumulating values which may lead to overflows if the slice
keeps getting extended due to continuous I/O.
blk-throttle does not modify throtl_slice (their equivalent of
slice_time).
Stefan
next prev parent reply other threads:[~2013-03-21 9:17 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-03-20 9:12 [Qemu-devel] [PATCH] Fix I/O throttling pathologic oscillating behavior Benoît Canet
2013-03-20 9:12 ` [Qemu-devel] [PATCH] block: fix bdrv_exceed_iops_limits wait computation Benoît Canet
2013-03-20 10:55 ` Zhi Yong Wu
2013-03-20 13:29 ` Stefan Hajnoczi
2013-03-20 14:28 ` Stefan Hajnoczi
2013-03-20 14:56 ` Benoît Canet
2013-03-20 15:12 ` Stefan Hajnoczi
2013-03-21 1:18 ` Zhi Yong Wu
2013-03-21 9:17 ` Stefan Hajnoczi [this message]
2013-03-21 13:04 ` Zhi Yong Wu
2013-03-21 15:14 ` Stefan Hajnoczi
2013-03-20 15:27 ` Benoît Canet
2013-03-21 10:34 ` Stefan Hajnoczi
2013-03-21 14:28 ` Benoît Canet
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130321091718.GA12555@stefanha-thinkpad.redhat.com \
--to=stefanha@gmail.com \
--cc=benoit.canet@irqsave.net \
--cc=kwolf@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@redhat.com \
--cc=wuzhy@linux.vnet.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).