[Qemu-devel] I/O accounting overhaul

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] I/O accounting overhaul
@ 2015-06-03 13:40 Alberto Garcia
  2015-06-03 14:18 ` Eric Blake
  2015-06-08 13:21 ` Stefan Hajnoczi
  0 siblings, 2 replies; 5+ messages in thread
From: Alberto Garcia @ 2015-06-03 13:40 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, qemu-block, Markus Armbruster, Max Reitz,
	Stefan Hajnoczi

Hello,

I would like to retake the work that Benoît was about to start last
year and extend the I/O accounting in QEMU. I was reading the past
discussions and I will try to summarize all the ideas.

The current accounting code collects the following information:

  typedef struct BlockAcctStats {
      uint64_t nr_bytes[BLOCK_MAX_IOTYPE];
      uint64_t nr_ops[BLOCK_MAX_IOTYPE];
      uint64_t total_time_ns[BLOCK_MAX_IOTYPE];
      uint64_t merged[BLOCK_MAX_IOTYPE];
      uint64_t wr_highest_sector;
  } BlockAcctStats;

where the arrays hold information for read, write and flush
operations.

The accounting stats are stored in the BlockDriverState, but they're
actually from the device backed by the BDS, so they could probably be
moved there. For the interface we could extend BlockDeviceStats and
add the new fields, but query-blockstats works on BDS, so maybe we
need new API?

The fields are mostly self-explanatory. merged counts the number of
requests merged into a single one (using virtio_blk_submit_multireq),
and wr_highest_sector is the number of the highest sector that has
been written.

In addition to those we can have:

uint64_t nr_invalid_ops[BLOCK_MAX_IOTYPE];
uint64_t nr_failed_ops[BLOCK_MAX_IOTYPE];

   The decision about whether to count these two as done (for e.g.
   total_time_ns) could be configurable by the user.

int64_t last_access_time_ns;

   This would be updated after each operation, and would be useful to
   know for how long a particular device has been idle.

uint64_t latency[BLOCK_MAX_IOTYPE];

   What we added in average to total_time_ns[] in the past second (or
   minute, or hour; the interval would be configurable). We could also
   collect the maximum and minimum latencies for that period.

   This could be updated every time an operation is accounted, so I
   think it could be implemented without adding any timer.

uint64_t queue_depth[BLOCK_MAX_IOTYPE];

   Average number of requests. Similar to the previous one. It would
   require us to keep a count of ongoing requests as well.

About the implementation, I read that it was possible to call
block_acct_start() without calling block_acct_done(). I don't know if
that's still the case, I need to check that.

I don't know if I'm forgetting anything. I have a rough implementation
covering most of the things I described, but of course it needs to be
polished etc. before publishing.

What do you think about this? Comments and suggestions are welcome.

Thanks,

Berto

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Qemu-devel] I/O accounting overhaul
  2015-06-03 13:40 [Qemu-devel] I/O accounting overhaul Alberto Garcia
@ 2015-06-03 14:18 ` Eric Blake
  2015-06-05 13:55   ` Alberto Garcia
  2015-06-08 13:21 ` Stefan Hajnoczi
  1 sibling, 1 reply; 5+ messages in thread
From: Eric Blake @ 2015-06-03 14:18 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Kevin Wolf, Markus Armbruster, Stefan Hajnoczi, qemu-block,
	Max Reitz

[-- Attachment #1: Type: text/plain, Size: 3590 bytes --]

On 06/03/2015 07:40 AM, Alberto Garcia wrote:
> Hello,
> 
> I would like to retake the work that Benoît was about to start last
> year and extend the I/O accounting in QEMU. I was reading the past
> discussions and I will try to summarize all the ideas.
> 
> The current accounting code collects the following information:
> 
>   typedef struct BlockAcctStats {
>       uint64_t nr_bytes[BLOCK_MAX_IOTYPE];
>       uint64_t nr_ops[BLOCK_MAX_IOTYPE];
>       uint64_t total_time_ns[BLOCK_MAX_IOTYPE];
>       uint64_t merged[BLOCK_MAX_IOTYPE];
>       uint64_t wr_highest_sector;
>   } BlockAcctStats;
> 
> where the arrays hold information for read, write and flush
> operations.
> 
> The accounting stats are stored in the BlockDriverState, but they're
> actually from the device backed by the BDS, so they could probably be
> moved there. For the interface we could extend BlockDeviceStats and
> add the new fields, but query-blockstats works on BDS, so maybe we
> need new API?
> 

We want stats per BDS (it would be nice to know how many reads are
satisfied from the active layer, vs. how many are satisfied from the
backing image, to know how stable and useful the backing image is).  But
we also want stats per BB (how many reads did the guest attempt,
regardless of which BDS served the read).  So any good solution needs to
work from both views (whether by two API, or by one with a flag, is
bike-shedding).

> The fields are mostly self-explanatory. merged counts the number of
> requests merged into a single one (using virtio_blk_submit_multireq),
> and wr_highest_sector is the number of the highest sector that has
> been written.

It would also be nice if wr_highest_sector could be populated even for
images that have not yet been written (right now, it starts life at 0
until a write, but if we can learn the current highest sector as part of
opening an image even for just reads, that would be a bit nicer).

> 
> In addition to those we can have:
> 
> uint64_t nr_invalid_ops[BLOCK_MAX_IOTYPE];
> uint64_t nr_failed_ops[BLOCK_MAX_IOTYPE];
> 
>    The decision about whether to count these two as done (for e.g.
>    total_time_ns) could be configurable by the user.
> 
> int64_t last_access_time_ns;
> 
>    This would be updated after each operation, and would be useful to
>    know for how long a particular device has been idle.
> 
> uint64_t latency[BLOCK_MAX_IOTYPE];
> 
>    What we added in average to total_time_ns[] in the past second (or
>    minute, or hour; the interval would be configurable). We could also
>    collect the maximum and minimum latencies for that period.
> 
>    This could be updated every time an operation is accounted, so I
>    think it could be implemented without adding any timer.
> 
> uint64_t queue_depth[BLOCK_MAX_IOTYPE];
> 
>    Average number of requests. Similar to the previous one. It would
>    require us to keep a count of ongoing requests as well.
> 
> About the implementation, I read that it was possible to call
> block_acct_start() without calling block_acct_done(). I don't know if
> that's still the case, I need to check that.
> 
> I don't know if I'm forgetting anything. I have a rough implementation
> covering most of the things I described, but of course it needs to be
> polished etc. before publishing.
> 
> What do you think about this? Comments and suggestions are welcome.
> 
> Thanks,
> 
> Berto
> 
> 

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Qemu-devel] I/O accounting overhaul
  2015-06-03 14:18 ` Eric Blake
@ 2015-06-05 13:55   ` Alberto Garcia
  0 siblings, 0 replies; 5+ messages in thread
From: Alberto Garcia @ 2015-06-05 13:55 UTC (permalink / raw)
  To: Eric Blake, qemu-devel
  Cc: Kevin Wolf, Markus Armbruster, Stefan Hajnoczi, qemu-block,
	Max Reitz

On Wed 03 Jun 2015 04:18:45 PM CEST, Eric Blake wrote:

>> The accounting stats are stored in the BlockDriverState, but they're
>> actually from the device backed by the BDS, so they could probably be
>> moved there. For the interface we could extend BlockDeviceStats and
>> add the new fields, but query-blockstats works on BDS, so maybe we
>> need new API?
>
> We want stats per BDS (it would be nice to know how many reads are
> satisfied from the active layer, vs. how many are satisfied from the
> backing image, to know how stable and useful the backing image is).
> But we also want stats per BB (how many reads did the guest attempt,
> regardless of which BDS served the read).  So any good solution needs
> to work from both views (whether by two API, or by one with a flag, is
> bike-shedding).

That's right. As I said my priority is the stats from the BB (i.e. what
the guest can see), but I agree that any solution has to consider that
we want to have both eventually.

Berto

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Qemu-devel] I/O accounting overhaul
  2015-06-03 13:40 [Qemu-devel] I/O accounting overhaul Alberto Garcia
  2015-06-03 14:18 ` Eric Blake
@ 2015-06-08 13:21 ` Stefan Hajnoczi
  2015-06-10 11:41   ` Alberto Garcia
  1 sibling, 1 reply; 5+ messages in thread
From: Stefan Hajnoczi @ 2015-06-08 13:21 UTC (permalink / raw)
  To: Alberto Garcia
  Cc: Kevin Wolf, qemu-block, Markus Armbruster, qemu-devel, Max Reitz

[-- Attachment #1: Type: text/plain, Size: 682 bytes --]

On Wed, Jun 03, 2015 at 03:40:42PM +0200, Alberto Garcia wrote:

Please structure the patches so that each statistic or group of
statistics has its own patch.  That will make it easy to review and
possibly merge a subset if some of the statistics prove to be
controversial.

> uint64_t queue_depth[BLOCK_MAX_IOTYPE];
> 
>    Average number of requests. Similar to the previous one. It would
>    require us to keep a count of ongoing requests as well.

How is this calculated?

You can keep track of the maximum queue depth easily (I think that's
what fio does, for example).  For anything else you need a concept of
time like a load average calculation.

Stefan

[-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Qemu-devel] I/O accounting overhaul
  2015-06-08 13:21 ` Stefan Hajnoczi
@ 2015-06-10 11:41   ` Alberto Garcia
  0 siblings, 0 replies; 5+ messages in thread
From: Alberto Garcia @ 2015-06-10 11:41 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, qemu-block, Markus Armbruster, qemu-devel, Max Reitz

On Mon 08 Jun 2015 03:21:19 PM CEST, Stefan Hajnoczi <stefanha@redhat.com> wrote:

> Please structure the patches so that each statistic or group of
> statistics has its own patch.

Yes, that's the plan.

>> uint64_t queue_depth[BLOCK_MAX_IOTYPE];
>> 
>>    Average number of requests. Similar to the previous one. It would
>>    require us to keep a count of ongoing requests as well.
>
> How is this calculated?
>
> You can keep track of the maximum queue depth easily (I think that's
> what fio does, for example).  For anything else you need a concept of
> time like a load average calculation.

We already need a way to compute the average latency for I/O operations
in a certain period of time, so we can use that. Benoît had already
written some code for that purpose and that's what I'm using as the
basis for my work.

https://lists.nongnu.org/archive/html/qemu-devel/2014-09/msg04844.html

Berto

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2015-06-10 11:42 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-06-03 13:40 [Qemu-devel] I/O accounting overhaul Alberto Garcia
2015-06-03 14:18 ` Eric Blake
2015-06-05 13:55   ` Alberto Garcia
2015-06-08 13:21 ` Stefan Hajnoczi
2015-06-10 11:41   ` Alberto Garcia

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).