Re: [Qemu-devel] [libvirt] IO accounting overhaul

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: "Benoît Canet" <benoit.canet@irqsave.net>
To: Kevin Wolf <kwolf@redhat.com>
Cc: "Benoît Canet" <benoit.canet@irqsave.net>,
	libvir-list@redhat.com, "Markus Armbruster" <armbru@redhat.com>,
	qemu-devel@nongnu.org, stefanha@redhat.com,
	anshul.makkar@profitbricks.com
Subject: Re: [Qemu-devel] [libvirt]  IO accounting overhaul
Date: Fri, 5 Sep 2014 16:56:34 +0200	[thread overview]
Message-ID: <20140905145634.GA3883@irqsave.net> (raw)
In-Reply-To: <20140905143031.GG4656@noname.str.redhat.com>

The Friday 05 Sep 2014 à 16:30:31 (+0200), Kevin Wolf wrote :
> Am 01.09.2014 um 13:41 hat Markus Armbruster geschrieben:
> > Benoît Canet <benoit.canet@irqsave.net> writes:
> > 
> > > The Monday 01 Sep 2014 à 11:52:00 (+0200), Markus Armbruster wrote :
> > >> Cc'ing libvirt following Stefan's lead.
> > >> 
> > >> Benoît Canet <benoit.canet@irqsave.net> writes:
> > >> > /* the following would compute latecies for slices of 1 seconds then toss the
> > >> >  * result and start a new slice. A weighted sumation of the instant latencies
> > >> >  * could help to implement this.
> > >> >  */
> > >> > 1s_read_average_latency
> > >> > 1s_write_average_latency
> > >> > 1s_flush_average_latency
> > >> >
> > >> > /* the former three numbers could be used to further compute a 1
> > >> > minute slice value */
> > >> > 1m_read_average_latency
> > >> > 1m_write_average_latency
> > >> > 1m_flush_average_latency
> > >> >
> > >> > /* the former three numbers could be used to further compute a 1 hours
> > >> > slice value */
> > >> > 1h_read_average_latency
> > >> > 1h_write_average_latency
> > >> > 1h_flush_average_latency
> > >> 
> > >> This is something like "what we added to total_FOO_time in the last
> > >> completed 1s / 1m / 1h time slice divided by the number of additions".
> > >> Just another way to accumulate the same raw data, thus no worries.
> > >> 
> > >> > /* 1 second average number of requests in flight */
> > >> > 1s_read_queue_depth
> > >> > 1s_write_queue_depth
> > >> >
> > >> > /* 1 minute average number of requests in flight */
> > >> > 1m_read_queue_depth
> > >> > 1m_write_queue_depth
> > >> >
> > >> > /* 1 hours average number of requests in flight */
> > >> > 1h_read_queue_depth
> > >> > 1h_write_queue_depth
> 
> I don't think I agree with putting fixed time periods like 1 s/min/h
> into qemu. What you need there is policy and we should probably make
> it configurable.
> 
> Do we need accounting for multiple time periods at the same time or
> would it be enough to have one and make its duration an option?
> 
> > > Optionally collecting the same data for each BDS of the graph.
> > 
> > If that's the case, keeping the shared infrastructure in the block layer
> > makes sense.
> > 
> > BDS member acct then holds I/O stats for the BDS.  We currently use it
> > for something else: I/O stats of the device model backed by this BDS.
> > That needs to move elsewhere.  Two places come to mind:
> > 
> > 1. BlockBackend, when it's available (I resumed working on it last week
> >    for a bit).  Superficially attractive, because it's close to what we
> >    have now, but then we have to deal with what to do when the backend
> >    gets disconnected from its device model, then connected to another
> >    one.
> > 
> > 2. The device models that actually implement I/O accounting.  Since
> >    query-blockstats names a backend rather than a device model, we need
> >    a BlockDevOps callback to fetch the stats.  Fetch fails when the
> >    callback is null.  Lets us distinguish "no stats yet" and "device
> >    model can't do stats", thus permits a QMP interface that doesn't lie.
> > 
> > Right now, I like (2) better.
> 
> So let's say I have some block device, which is attached to a guest
> device for a while, but then I detach it and continue using it in a
> different place (maybe another guest device or a block job). Should we
> really reset all counters in query-blockstats to 0?
> 
> I think as I user I would be surprised about this, because I still refer
> to it by the same name (the device_name, which will be in the BB), so
> it's the same thing for me and the total requests include everything
> that was ever issued against it.

It all depends on where you are in the user food chain.
If you are a cloud end user you are interested in the stats associated with
the hardware of your /dev/vdx because that's what iostat allow you to see.

> 
> > > -API wize I think about adding
> > > bdrv_acct_invalid() and
> > > bdrv_acct_failed() and systematically issuing a bdrv_acct_start() asap.
> > 
> > Complication: partial success.  Example:
> > 
> > 1. Guest requests a read of N sectors.
> > 
> > 2. Device model calls
> >    bdrv_acct_start(s->bs, &req->acct, N * BDRV_SECTOR_SIZE, BDRV_ACCT_READ)
> > 
> > 3. Device model examines the request, and deems it valid.
> > 
> > 4. Device model passes it to the block layer.
> > 
> > 5. Block layer does its thing, but for some reason only M < N sectors
> >    can be read.  Block layer returns M.
> 
> No, it returns -errno.
> 
> > 6. What's the device model to do now?  Both bdrv_acct_failed() and
> >    bdrv_acct_done() would be wrong.
> > 
> >    Should the device model account for a read of size M?  This ignores
> >    the partial failure.
> > 
> >    Should it split the read into a successful and a failed part for
> >    accounting purposes?  This miscounts the number of reads.
> 
> I think we should simply account it as a failed request because this is
> what it will look like for the guest. If you want the partial data that
> was internally issued, you need to look at different statistics
> (probably those of bs->file).
> 
> Kevin

next prev parent reply	other threads:[~2014-09-05 14:57 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-08-28 14:38 [Qemu-devel] IO accounting overhaul Benoît Canet
2014-08-29 16:04 ` Stefan Hajnoczi
2014-08-29 16:32   ` Benoît Canet
2014-09-01  9:52 ` Markus Armbruster
2014-09-01 10:44   ` [Qemu-devel] [libvirt] " Benoît Canet
2014-09-01 11:41     ` Markus Armbruster
2014-09-01 13:38       ` Benoît Canet
2014-09-02 13:59         ` Markus Armbruster
2014-09-05 14:30       ` Kevin Wolf
2014-09-05 14:56         ` Benoît Canet [this message]
2014-09-05 14:57         ` Benoît Canet
2014-09-05 15:24         ` Benoît Canet
2014-09-08  7:12         ` Markus Armbruster
2014-09-08  9:12           ` Kevin Wolf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140905145634.GA3883@irqsave.net \
    --to=benoit.canet@irqsave.net \
    --cc=anshul.makkar@profitbricks.com \
    --cc=armbru@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=libvir-list@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).