Re: [PATCH 0/4] io-controller: Add new interfaces to trace backlogged group status

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
To: Divyesh Shah <dpshah@google.com>
Cc: Vivek Goyal <vgoyal@redhat.com>,
	Jens Axboe <jens.axboe@oracle.com>,
	linux kernel mailing list <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 0/4] io-controller: Add new interfaces to trace backlogged group status
Date: Fri, 11 Jun 2010 14:31:08 +0800	[thread overview]
Message-ID: <4C11D82C.50902@cn.fujitsu.com> (raw)
In-Reply-To: <AANLkTikZSgnSqK_JMVqy9ncFse0EbZm8j4WTTcJ00hwL@mail.gmail.com>

Divyesh Shah wrote:
> On Tue, May 25, 2010 at 6:25 AM, Vivek Goyal <vgoyal@redhat.com> wrote:
>> On Tue, May 25, 2010 at 11:00:54AM +0800, Gui Jianfeng wrote:
>>> Vivek Goyal wrote:
>>>> On Tue, May 25, 2010 at 09:37:31AM +0800, Gui Jianfeng wrote:
>>>>> Vivek Goyal wrote:
>>>>>> On Mon, May 24, 2010 at 09:12:05AM +0800, Gui Jianfeng wrote:
>>>>>>> Vivek Goyal wrote:
>>>>>>>> On Fri, May 21, 2010 at 04:40:50PM +0800, Gui Jianfeng wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> This series implements three new interfaces to keep track of tranferred bytes,
>>>>>>>>> elapsing time and io rate since group getting backlogged. If the group dequeues
>>>>>>>>> from service tree, these three interfaces will reset and shows zero.
>>>>>>>> Hi Gui,
>>>>>>>>
>>>>>>>> Can you give some details regarding how this functionality is useful? Why
>>>>>>>> would somebody be interested in only in stats of till group was
>>>>>>>> backlogged and not in total stats?
>>>>>>>>
>>>>>>>> Groups can come and go so fast and these stats will reset so many times
>>>>>>>> that I am not able to visualize how these stats will be useful.
>>>>>>> Hi Vivek,
>>>>>>>
>>>>>>> Currently, we assign weight to a group, but user still doesn't know how fast the
>>>>>>> group runs. With io rate interface, users can check the rate of a group at any
>>>>>>> moment, or to determine whether the weight assigned to a group is enough.
>>>>>>> bytes and time interface is just for debug purpose.
>>>>>> Gui,
>>>>>>
>>>>>> I still don't understand that why blkio.sectors or blkio.io_service_bytes
>>>>>> or blkio.io_serviced interfaces are not good enough to determine at what
>>>>>> rate a group is doing IO.
>>>>>>
>>>>>> I think we can very well write something in userspace like "iostat" to
>>>>>> display the per group rate. Utility can read the any of the above files
>>>>>> say at the interfval of 1s, calculate the diff between the values and
>>>>>> display that as group effective rate.
>>>>> Hi Vivek,
>>>>>
>>>>> blkio.io_active_rate reflects the rate since group get backlogged, so the rate is a smooth
>>>>> value. This value represents the actual rate a group runs. IMO, io rate calculated from
>>>>> user space is not accurate in following two scenarios:
>>>>>
>>>>> 1 Userspace app chooses the interval of 1s, if 0.5s is backlogged and 0.5s is not, the
>>>>>   rate calculated in this interval doesn't make sense.
>>>>>
>>>> If you are not servicing groups for long time, anyway it is very bad for
>>>> latency. So that's why soft limit of 300ms of CFQ makes sense and
>>>> practically I am not sure you will be blocking groups for .5s.
>>>>
>>>> Even if you do, then user just needs to choose a bigger interval and you
>>>> will see more smooth rates. Reduce the interval and you might see little
>>>> bursty rate.
>>> Vivek,
>>>
>>> IIUC, the most big problem for user app is the user app doesn't know how long
>>> the group has been dequeued during the interval. For example, user choose
>>> 10s interval, 8s of which is not backlogged, but when user app calculates
>>> io rate, this 8s still include. So this rate isn't what we want. Am i missing
>>> something?
>> Gui,
>>
>> If user application is not doing enough IO and group is getting deleted
>> fast, io_active_rate is not going to give you any meaningful data as it
>> will be lost the moment group gets deleted.
>>
>> Hence one needs to monitor the IO rate when a workload is running and is
>> keeping disk busy more or less all the time.
>>
>> Even in your example, if you monitored IO rate over 10 second interval and
>> group is not doing any IO, you just can't do anything about it. Just that
>> your measurement e method is wrong. Even io_active_rate will not help you
>> here as by the time you read the file, group is gone and there is no data.
>>
>> The very reason you want to monitor rate is that you want to make sure
>> group is getting enough BW. If group is not doing IO then one can look at
>> blkio.dequeue file and see if group is getting deleted too frequently. If
>> yes, that means group is not doing enough IO to keep the disk busy. One
>> can also try increasing the weight of the group but that will not help
>> much if group does not remain backlogged for significant amount of time.
>>
>>> "io_active_rate" will never take un-backlogged time into account when calculating
>>> io rate.
>>>
>> Theoritically blkio.sectors/blkio.time gives the rate excluding the time
>> when group was not backlogged?
> 
> I agree with Vivek here. We use blkio.time as a source for io rate
> count for each cgroup, knowing that it is not entirely accurate but a
> good enough approximation.
> 
> Gui, if you want to find out whether the cgroup has enough weight or
> not, I'd recommend looking at the wait_time stat in addition to
> blkio.time. It has been very useful in identifying jobs that are not
> getting enough IO done due to less weight assigned to them.

Ok, see. :)

Thanks,
Gui

> 
>> But I will not recommend using blkio.time as it is very approximate.
>>
>> I really am not able to see what this interface is really buying you.
>>
>> Thanks
>> Vivek
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/
>>
>

     prev parent reply	other threads:[~2010-06-11  6:33 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-05-21  8:40 [PATCH 0/4] io-controller: Add new interfaces to trace backlogged group status Gui Jianfeng
2010-05-21  8:43 ` [PATCH 1/4] io-controller: a new interface to keep track of bytes during group is backlogged Gui Jianfeng
2010-05-21  8:44 ` [PATCH 2/4] io-controller: a new interface to keep track of the time since group bacame backlogged Gui Jianfeng
2010-05-21  8:45 ` [PATCH 3/4] io-controller: a new interface to keep track of io rate when group is backlogged Gui Jianfeng
2010-05-21  8:46 ` [PATCH 4/4] io-controller: Document for active bytes, time and rate Gui Jianfeng
2010-05-26 18:57   ` Randy Dunlap
2010-05-21 13:17 ` [PATCH 0/4] io-controller: Add new interfaces to trace backlogged group status Vivek Goyal
2010-05-24  1:12   ` Gui Jianfeng
2010-05-24 21:22     ` Vivek Goyal
2010-05-25  1:37       ` Gui Jianfeng
2010-05-25  2:03         ` Vivek Goyal
2010-05-25  3:00           ` Gui Jianfeng
2010-05-25 13:25             ` Vivek Goyal
2010-06-11  5:10               ` Divyesh Shah
2010-06-11  6:31                 ` Gui Jianfeng [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4C11D82C.50902@cn.fujitsu.com \
    --to=guijianfeng@cn.fujitsu.com \
    --cc=dpshah@google.com \
    --cc=jens.axboe@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=vgoyal@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.