From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([208.118.235.92]:43325)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <stefanha@redhat.com>) id 1UIJ5I-0006IY-Ka
	for qemu-devel@nongnu.org; Wed, 20 Mar 2013 09:29:48 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <stefanha@redhat.com>) id 1UIJ5C-0001Aq-3E
	for qemu-devel@nongnu.org; Wed, 20 Mar 2013 09:29:40 -0400
Received: from mx1.redhat.com ([209.132.183.28]:3554)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <stefanha@redhat.com>) id 1UIJ5B-0001AP-Rj
	for qemu-devel@nongnu.org; Wed, 20 Mar 2013 09:29:34 -0400
Date: Wed, 20 Mar 2013 14:29:24 +0100
From: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20130320132924.GB14441@stefanha-thinkpad.muc.redhat.com>
References: <1363770734-30970-1-git-send-email-benoit@irqsave.net>
	<1363770734-30970-2-git-send-email-benoit@irqsave.net>
MIME-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
In-Reply-To: <1363770734-30970-2-git-send-email-benoit@irqsave.net>
Content-Transfer-Encoding: quoted-printable
Subject: Re: [Qemu-devel] [PATCH] block: fix bdrv_exceed_iops_limits wait
	computation
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: =?iso-8859-1?Q?Beno=EEt?= Canet <benoit@irqsave.net>
Cc: kwolf@redhat.com, wuzhy@linux.vnet.ibm.com, qemu-devel@nongnu.org

On Wed, Mar 20, 2013 at 10:12:14AM +0100, Beno=EEt Canet wrote:
> This patch fix an I/O throttling behavior triggered by limiting at 150 =
iops
> and running a load of 50 threads doing random pwrites on a raw virtio d=
evice.
>=20
> After a few moments the iops count start to oscillate steadily between =
0 and a
> value upper than the limit.
>=20
> As the load keep running the period and the amplitude of the oscillatio=
n
> increase until io bursts reaching the physical storage max iops count a=
re
> done.
>=20
> These bursts are followed by guest io starvation.
>=20
> As the period of this oscillation cycle is increasing the cause must be=
 a
> computation error leading to increase slowly the wait time.
>=20
> This patch make the wait time a bit smaller and tests confirm that it s=
olves
> the oscillating behavior.
>=20
> Signed-off-by: Benoit Canet <benoit@irqsave.net>
> ---
>  block.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>=20
> diff --git a/block.c b/block.c
> index 0a062c9..455d8b0 100644
> --- a/block.c
> +++ b/block.c
> @@ -3739,7 +3739,7 @@ static bool bdrv_exceed_iops_limits(BlockDriverSt=
ate *bs, bool is_write,
>      }
> =20
>      /* Calc approx time to dispatch */
> -    wait_time =3D (ios_base + 1) / iops_limit;
> +    wait_time =3D ios_base / iops_limit;
>      if (wait_time > elapsed_time) {
>          wait_time =3D wait_time - elapsed_time;
>      } else {

I tried reproducing without your test case:

dd if=3D/dev/vdb of=3D/dev/null bs=3D4096 iflag=3Ddirect

I've pasted printfs below which reveals that wait time increases monotoni=
cally!
In other words, dd is slowed down more and more as it runs:

bs 0x7f56fe187a30 co 0x7f56fe211cb0 throttled for 1426 ms
bs 0x7f56fe187a30 co 0x7f56fe211cb0 woke up from throttled_reqs after sle=
eping
bs 0x7f56fe187a30 co 0x7f56fe211cb0 throttled for 1431 ms
bs 0x7f56fe187a30 co 0x7f56fe211cb0 woke up from throttled_reqs after sle=
eping
bs 0x7f56fe187a30 co 0x7f56fe211cb0 throttled for 1437 ms
bs 0x7f56fe187a30 co 0x7f56fe211cb0 woke up from throttled_reqs after sle=
eping
...

Killing dd and starting it again resets the accumulated delay (probably b=
ecause
we end the slice and state is cleared).

This suggests workloads that are constantly at the I/O limit will experie=
nce
creeping delay or the oscillations you found.

After applying your patch I observed the opposite behavior: wait time dec=
reases
until it resets itself.  Perhaps we're waiting less and less until we jus=
t
finish the slice and all values reset:

bs 0x7f2cd2c52a30 co 0x7f2cd2ce3910 throttled for 496 ms
bs 0x7f2cd2c52a30 co 0x7f2cd2ce3910 woke up from throttled_reqs after sle=
eping
bs 0x7f2cd2c52a30 co 0x7f2cd2ce3910 throttled for 489 ms
bs 0x7f2cd2c52a30 co 0x7f2cd2ce3910 woke up from throttled_reqs after sle=
eping
bs 0x7f2cd2c52a30 co 0x7f2cd2ce3910 throttled for 484 ms
bs 0x7f2cd2c52a30 co 0x7f2cd2ce3910 woke up from throttled_reqs after sle=
eping
bs 0x7f2cd2c52a30 co 0x7f2cd2ce3910 throttled for 480 ms
bs 0x7f2cd2c52a30 co 0x7f2cd2ce3910 woke up from throttled_reqs after sle=
eping
bs 0x7f2cd2c52a30 co 0x7f2cd2ce3910 throttled for 474 ms
...
bs 0x7f2cd2c52a30 co 0x7f2cd2ce3910 throttled for 300 ms
bs 0x7f2cd2c52a30 co 0x7f2cd2ce3910 woke up from throttled_reqs after sle=
eping
bs 0x7f2cd2c52a30 co 0x7f2cd2ce3910 throttled for 299 ms
bs 0x7f2cd2c52a30 co 0x7f2cd2ce3910 woke up from throttled_reqs after sle=
eping
bs 0x7f2cd2c52a30 co 0x7f2cd2ce3910 throttled for 298 ms
bs 0x7f2cd2c52a30 co 0x7f2cd2ce3910 woke up from throttled_reqs after sle=
eping
bs 0x7f2cd2c52a30 co 0x7f2cd2ce3910 throttled for 494 ms

I'm not confident that I understand the effects of your patch.  Do you ha=
ve an
explanation for these results?

More digging will probably be necessary to solve the underlying problem h=
ere.

diff --git a/block.c b/block.c
index 0a062c9..7a8c9e6 100644
--- a/block.c
+++ b/block.c
@@ -175,7 +175,9 @@ static void bdrv_io_limits_intercept(BlockDriverState=
 *bs,
     int64_t wait_time =3D -1;
=20
     if (!qemu_co_queue_empty(&bs->throttled_reqs)) {
+        fprintf(stderr, "bs %p co %p waiting for throttled_reqs\n", bs, =
qemu_coroutine_self());
         qemu_co_queue_wait(&bs->throttled_reqs);
+        fprintf(stderr, "bs %p co %p woke up from throttled_reqs\n", bs,=
 qemu_coroutine_self());
     }
=20
     /* In fact, we hope to keep each request's timing, in FIFO mode. The=
 next
@@ -188,7 +190,9 @@ static void bdrv_io_limits_intercept(BlockDriverState=
 *bs,
     while (bdrv_exceed_io_limits(bs, nb_sectors, is_write, &wait_time)) =
{
         qemu_mod_timer(bs->block_timer,
                        wait_time + qemu_get_clock_ns(vm_clock));
+        fprintf(stderr, "bs %p co %p throttled for %"PRId64" ms\n", bs, =
qemu_coroutine_self(), wait_time
         qemu_co_queue_wait_insert_head(&bs->throttled_reqs);
+        fprintf(stderr, "bs %p co %p woke up from throttled_reqs after s=
leeping\n", bs, qemu_coroutine_s
     }
=20
     qemu_co_queue_next(&bs->throttled_reqs);