From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:37591)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <benoit.canet@irqsave.net>) id 1XN0qw-0007FF-3O
	for qemu-devel@nongnu.org; Thu, 28 Aug 2014 10:39:12 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <benoit.canet@irqsave.net>) id 1XN0qo-0002oE-8x
	for qemu-devel@nongnu.org; Thu, 28 Aug 2014 10:39:06 -0400
Received: from lputeaux-656-01-25-125.w80-12.abo.wanadoo.fr
	([80.12.84.125]:57746 helo=paradis.irqsave.net)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <benoit.canet@irqsave.net>) id 1XN0qn-0002nr-Uw
	for qemu-devel@nongnu.org; Thu, 28 Aug 2014 10:38:58 -0400
Date: Thu, 28 Aug 2014 16:38:09 +0200
From: =?iso-8859-1?Q?Beno=EEt?= Canet <benoit.canet@irqsave.net>
Message-ID: <20140828143809.GB28789@irqsave.net>
MIME-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable
Subject: [Qemu-devel] IO accounting overhaul
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: qemu-devel@nongnu.org
Cc: kwolf@redhat.com, anshul.makkar@profitbricks.com, armbru@redhat.com, stefanha@redhat.com


Hi,

I collected some items of a cloud provider wishlist regarding I/O accouti=
ng.

In a cloud I/O accouting can have 3 purpose: billing, helping the custome=
rs
and doing metrology to help the cloud provider seeks hidden costs.

I'll cover the two former topic in this mail because they are the most im=
portant
business wize.

1) prefered place to collect billing IO accounting data:
--------------------------------------------------------
For billing purpose the collected data must be as close as possible to wh=
at the
customer would see by using iostats in his vm.

The first conclusion we can draw is that the choice of collecting IO acco=
uting
data used for billing in the block devices models is right.

2) what to do with occurences of rare events:
---------------------------------------------

Another point is that QEMU developpers agree that they don't know which p=
olicy
to apply to some I/O accounting events.
Must QEMU discard invalid I/O write IO or account them as done ?
Must QEMU count a failed read I/O as done ?

When discusting this with a cloud provider the following appears: these d=
ecisions
are really specific to each cloud provider and QEMU should not implement =
them.
The right thing to do is to add accouting counters to collect these event=
s.

Moreover these rare events are precious troubleshooting data so it's an a=
dditional
reason not to toss them.

3) list of block I/O accouting metrics wished for billing and helping the=
 customers
-------------------------------------------------------------------------=
----------

Basic I/O accouting data will end up making the customers bills.
Extra I/O accouting informations would be a precious help for the cloud p=
rovider
to implement a monitoring panel like Amazon Cloudwatch.

Here is the list of counters and statitics I would like to help implement=
 in QEMU.

This is the most important part of the mail and the one I would like the =
community
review the most.

Once this list is settled I would proceed to implement the required infra=
structure
in QEMU before using it in the device models.

/* volume of data transfered by the IOs */
read_bytes
write_bytes

/* operation count */
read_ios
write_ios
flush_ios

/* how many invalid IOs the guest submit */
invalid_read_ios
invalid_write_ios
invalid_flush_ios

/* how many io error happened */
read_ios_error
write_ios_error
flush_ios_error

/* account the time passed doing IOs */
total_read_time
total_write_time
total_flush_time

/* since when the volume is iddle */
qvolume_iddleness_time

/* the following would compute latecies for slices of 1 seconds then toss=
 the
 * result and start a new slice. A weighted sumation of the instant laten=
cies
 * could help to implement this.
 */
1s_read_average_latency
1s_write_average_latency
1s_flush_average_latency

/* the former three numbers could be used to further compute a 1 minute s=
lice value */
1m_read_average_latency
1m_write_average_latency
1m_flush_average_latency

/* the former three numbers could be used to further compute a 1 hours sl=
ice value */
1h_read_average_latency
1h_write_average_latency
1h_flush_average_latency

/* 1 second average number of requests in flight */
1s_read_queue_depth
1s_write_queue_depth

/* 1 minute average number of requests in flight */
1m_read_queue_depth
1m_write_queue_depth

/* 1 hours average number of requests in flight */
1h_read_queue_depth
1h_write_queue_depth

4) Making this happen
-------------------------

Outscale want to make these IO stat happen and gave me the go to do whate=
ver
grunt is required to do so.
That said we could collaborate on some part of the work.

Best regards

Beno=EEt