From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:41425) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UIfAs-000532-RV for qemu-devel@nongnu.org; Thu, 21 Mar 2013 09:04:57 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1UIfAm-0001gB-Kj for qemu-devel@nongnu.org; Thu, 21 Mar 2013 09:04:54 -0400 Received: from e8.ny.us.ibm.com ([32.97.182.138]:39117) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UIfAm-0001er-Ed for qemu-devel@nongnu.org; Thu, 21 Mar 2013 09:04:48 -0400 Received: from /spool/local by e8.ny.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Thu, 21 Mar 2013 09:04:35 -0400 Received: from d01relay03.pok.ibm.com (d01relay03.pok.ibm.com [9.56.227.235]) by d01dlp01.pok.ibm.com (Postfix) with ESMTP id 2A6F338C806C for ; Thu, 21 Mar 2013 09:04:33 -0400 (EDT) Received: from d01av02.pok.ibm.com (d01av02.pok.ibm.com [9.56.224.216]) by d01relay03.pok.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id r2LD4WbZ308800 for ; Thu, 21 Mar 2013 09:04:32 -0400 Received: from d01av02.pok.ibm.com (loopback [127.0.0.1]) by d01av02.pok.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id r2LD4WDE030748 for ; Thu, 21 Mar 2013 10:04:32 -0300 From: Zhi Yong Wu Date: Thu, 21 Mar 2013 21:04:20 +0800 In-Reply-To: <20130321091718.GA12555@stefanha-thinkpad.redhat.com> References: <1363770734-30970-1-git-send-email-benoit@irqsave.net> <1363770734-30970-2-git-send-email-benoit@irqsave.net> <20130320132924.GB14441@stefanha-thinkpad.muc.redhat.com> <20130320142842.GA2389@stefanha-thinkpad.muc.redhat.com> <20130320145633.GA1473@irqsave.net> <20130320151203.GA3584@stefanha-thinkpad.muc.redhat.com> <1363828709.32706.86.camel@f15> <20130321091718.GA12555@stefanha-thinkpad.redhat.com> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8bit Message-ID: <1363871063.32706.97.camel@f15> Mime-Version: 1.0 Subject: Re: [Qemu-devel] [PATCH] block: fix bdrv_exceed_iops_limits wait computation Reply-To: wuzhy@linux.vnet.ibm.com List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Stefan Hajnoczi Cc: =?ISO-8859-1?Q?Beno=EEt?= Canet , kwolf@redhat.com, qemu-devel@nongnu.org, Stefan Hajnoczi On Thu, 2013-03-21 at 10:17 +0100, Stefan Hajnoczi wrote: > On Thu, Mar 21, 2013 at 09:18:27AM +0800, Zhi Yong Wu wrote: > > On Wed, 2013-03-20 at 16:12 +0100, Stefan Hajnoczi wrote: > > > On Wed, Mar 20, 2013 at 03:56:33PM +0100, BenoƮt Canet wrote: > > > > > But I don't understand why bs->slice_time is modified instead of keeping > > > > > it constant at 100 ms: > > > > > > > > > > bs->slice_time = wait_time * BLOCK_IO_SLICE_TIME * 10; > > > > > bs->slice_end += bs->slice_time - 3 * BLOCK_IO_SLICE_TIME; > > > > > if (wait) { > > > > > *wait = wait_time * BLOCK_IO_SLICE_TIME * 10; > > > > > } > > > > > > > > In bdrv_exceed_bps_limits there is an equivalent to this with a comment. > > > > > > > > --------- > > > > /* When the I/O rate at runtime exceeds the limits, > > > > * bs->slice_end need to be extended in order that the current statistic > > > > * info can be kept until the timer fire, so it is increased and tuned > > > > * based on the result of experiment. > > > > */ > > > > bs->slice_time = wait_time * BLOCK_IO_SLICE_TIME * 10; > > > > bs->slice_end += bs->slice_time - 3 * BLOCK_IO_SLICE_TIME; > > > > if (wait) { > > > > *wait = wait_time * BLOCK_IO_SLICE_TIME * 10; > > > > } > > > > ---------- > > > > > > The comment explains why slice_end needs to be extended, but not why > > > bs->slice_time should be changed (except that it was tuned as the result > > > of an experiment). > > > > > > Zhi Yong: Do you remember a reason for modifying bs->slice_time? > > Stefan, > > In some case that the bare I/O speed is very fast on physical machine, > > when I/O speed is limited to be one lower value, I/O need to wait for > > one relative longer time(i.e. wait_time). You know, wait_time should be > > smaller than slice_time, if slice_time is constant, wait_time may not be > > its expected value, so the throttling function will not work well. > > For example, bare I/O speed is 100MB/s, I/O throttling speed is 1MB/s, > > slice_time is constant, and set to 50ms(a assumed value) or smaller, If > > current I/O can be throttled to 1MB/s, its wait_time is expected to > > 100ms(a assumed value), and is more bigger than current slice_time, I/O > > throttling function will not throttle actual I/O speed well. In the > > case, slice_time need to be adjusted to one more suitable value which > > depends on wait_time. > > When an I/O request spans a slice: > 1. It must wait until enough resources are available. > 2. We extend the slice so that existing accounting is not lost. > > But I don't understand what you say about a fast host. The bare metal I mean that a fast host is one host with very high metal throughput. > throughput does not affect the throttling calculation. The only values > that matter are bps limit and slice time: > > In your example the slice time is 50ms and the current request needs > 100ms. We need to extend slice_end to at least 100ms so that we can > account for this request. > > Why should slice_time be changed? It isn't one must choice, if you have one better way, we can maybe do it based on your way. I thought that if wait_time is big in previous slice window, slice_time should also be adjusted to be a bit bigger accordingly for next slice window. > > > In some other case that the bare I/O speed is very slow and I/O > > throttling speed is fast, slice_time also need to be adjusted > > dynamically based on wait_time. > > If the host is slower than the I/O limit there are two cases: This is not what i mean; I mean that the bare I/O speed is faster than I/O limit, but their gap is very small. > > 1. Requests are below I/O limit. We do not throttle, the host is slow > but that's okay. > > 2. Requests are above I/O limit. We throttle them but actually the host > will slow them down further to the bare metal speed. This is also fine. > > Again, I don't see a nice to change slice_time. > > BTW I discovered one thing that Linux blk-throttle does differently from > QEMU I/O throttling: we do not trim completed slices. I think trimming > avoids accumulating values which may lead to overflows if the slice > keeps getting extended due to continuous I/O. QEMU I/O throttling is not completely same as Linux Block throttle way. > > blk-throttle does not modify throtl_slice (their equivalent of > slice_time). > > Stefan > -- Regards, Zhi Yong Wu