From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([208.118.235.92]:34381)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <stefanha@gmail.com>) id 1UIbci-00020W-AT
	for qemu-devel@nongnu.org; Thu, 21 Mar 2013 05:17:26 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <stefanha@gmail.com>) id 1UIbcg-0000Zf-Mk
	for qemu-devel@nongnu.org; Thu, 21 Mar 2013 05:17:24 -0400
Received: from mail-wg0-f43.google.com ([74.125.82.43]:38560)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <stefanha@gmail.com>) id 1UIbcg-0000ZZ-Cp
	for qemu-devel@nongnu.org; Thu, 21 Mar 2013 05:17:22 -0400
Received: by mail-wg0-f43.google.com with SMTP id e12so2091854wge.22
	for <qemu-devel@nongnu.org>; Thu, 21 Mar 2013 02:17:21 -0700 (PDT)
Date: Thu, 21 Mar 2013 10:17:18 +0100
From: Stefan Hajnoczi <stefanha@gmail.com>
Message-ID: <20130321091718.GA12555@stefanha-thinkpad.redhat.com>
References: <1363770734-30970-1-git-send-email-benoit@irqsave.net>
	<1363770734-30970-2-git-send-email-benoit@irqsave.net>
	<20130320132924.GB14441@stefanha-thinkpad.muc.redhat.com>
	<20130320142842.GA2389@stefanha-thinkpad.muc.redhat.com>
	<20130320145633.GA1473@irqsave.net>
	<20130320151203.GA3584@stefanha-thinkpad.muc.redhat.com>
	<1363828709.32706.86.camel@f15>
MIME-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <1363828709.32706.86.camel@f15>
Subject: Re: [Qemu-devel] [PATCH] block: fix bdrv_exceed_iops_limits wait
 computation
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Cc: =?iso-8859-1?Q?Beno=EEt?= Canet <benoit.canet@irqsave.net>, kwolf@redhat.com, qemu-devel@nongnu.org, Stefan Hajnoczi <stefanha@redhat.com>

On Thu, Mar 21, 2013 at 09:18:27AM +0800, Zhi Yong Wu wrote:
> On Wed, 2013-03-20 at 16:12 +0100, Stefan Hajnoczi wrote:
> > On Wed, Mar 20, 2013 at 03:56:33PM +0100, Benoīt Canet wrote:
> > > > But I don't understand why bs->slice_time is modified instead of keeping
> > > > it constant at 100 ms:
> > > >
> > > >     bs->slice_time = wait_time * BLOCK_IO_SLICE_TIME * 10;
> > > >     bs->slice_end += bs->slice_time - 3 * BLOCK_IO_SLICE_TIME;
> > > >     if (wait) {
> > > >         *wait = wait_time * BLOCK_IO_SLICE_TIME * 10;
> > > >     }
> > > 
> > > In bdrv_exceed_bps_limits there is an equivalent to this with a comment.
> > > 
> > > ---------
> > >   /* When the I/O rate at runtime exceeds the limits,
> > >      * bs->slice_end need to be extended in order that the current statistic
> > >      * info can be kept until the timer fire, so it is increased and tuned
> > >      * based on the result of experiment.
> > >      */
> > >     bs->slice_time = wait_time * BLOCK_IO_SLICE_TIME * 10;
> > >     bs->slice_end += bs->slice_time - 3 * BLOCK_IO_SLICE_TIME;
> > >     if (wait) {
> > >         *wait = wait_time * BLOCK_IO_SLICE_TIME * 10;
> > >     }
> > > ----------
> > 
> > The comment explains why slice_end needs to be extended, but not why
> > bs->slice_time should be changed (except that it was tuned as the result
> > of an experiment).
> > 
> > Zhi Yong: Do you remember a reason for modifying bs->slice_time?
> Stefan,
>   In some case that the bare I/O speed is very fast on physical machine,
> when I/O speed is limited to be one lower value, I/O need to wait for
> one relative longer time(i.e. wait_time). You know, wait_time should be
> smaller than slice_time, if slice_time is constant, wait_time may not be
> its expected value, so the throttling function will not work well.
>   For example, bare I/O speed is 100MB/s, I/O throttling speed is 1MB/s,
> slice_time is constant, and set to 50ms(a assumed value) or smaller, If
> current I/O can be throttled to 1MB/s, its wait_time is expected to
> 100ms(a assumed value), and is more bigger than current slice_time, I/O
> throttling function will not throttle actual I/O speed well. In the
> case, slice_time need to be adjusted to one more suitable value which
> depends on wait_time.

When an I/O request spans a slice:
1. It must wait until enough resources are available.
2. We extend the slice so that existing accounting is not lost.

But I don't understand what you say about a fast host.  The bare metal
throughput does not affect the throttling calculation.  The only values
that matter are bps limit and slice time:

In your example the slice time is 50ms and the current request needs
100ms.  We need to extend slice_end to at least 100ms so that we can
account for this request.

Why should slice_time be changed?

>   In some other case that the bare I/O speed is very slow and I/O
> throttling speed is fast, slice_time also need to be adjusted
> dynamically based on wait_time.

If the host is slower than the I/O limit there are two cases:

1. Requests are below I/O limit.  We do not throttle, the host is slow
but that's okay.

2. Requests are above I/O limit.  We throttle them but actually the host
will slow them down further to the bare metal speed.  This is also fine.

Again, I don't see a nice to change slice_time.

BTW I discovered one thing that Linux blk-throttle does differently from
QEMU I/O throttling: we do not trim completed slices.  I think trimming
avoids accumulating values which may lead to overflows if the slice
keeps getting extended due to continuous I/O.

blk-throttle does not modify throtl_slice (their equivalent of
slice_time).

Stefan