Re: [PATCH] blkback: Fix block I/O latency issue

xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed

From: "Vincent, Pradeep" <pradeepv@amazon.com>
To: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Daniel, Jeremy Fitzhardinge <jeremy@goop.org>,
	"xen-devel@lists.xensource.com" <xen-devel@lists.xensource.com>,
	Jan Beulich <JBeulich@novell.com>,
	Stodden <daniel.stodden@citrix.com>
Subject: Re: [PATCH] blkback: Fix block I/O latency issue
Date: Thu, 19 May 2011 23:12:25 -0700	[thread overview]
Message-ID: <C9FAE626.161E7%pradeepv@amazon.com> (raw)
In-Reply-To: <20110516152224.GA7195@dumpdata.com>

Hey Konrad, 

Thanks for running the tests. Very useful data.

Re: Experiment to show latency improvement

I never ran anything on ramdisk.

You should be able to see the latency benefit with 'orion' tool but I am
sure other tools can be used as well. For a volume backed by a single disk
drive, keep the number of small random I/O outstanding to 2 (I think
"num_small" parameter in orion should do the job) with a 50-50 mix of
write and read. Measure the latency reported by the guest and Dom-0 &
compare them. For LVM volumes that present multiple drives as a single LUN
(inside the guest), the latency improvement will be the highest when the
number of I/O outstanding is 2X the number of spindles. This is the
'moderate I/O' scenario I was describing and you should see significant
improvement in latencies.

If you allow page cache to perform sequential I/O using dd or other
sequential non-direct I/O generation tool, you should find that the
interrupt rate doesn't go up for high I/O load. Thinking about this, I
think burstiness of I/O submission as seen by the driver is also a key
player particularly in the absence of I/O coalescing waits introduced by
I/O scheduler. Page cache draining is notoriously bursty.

>>queue depth of 256.

What 'queue depth' is this ? If I am not wrong, blkfront-blkback is
restricted to ~32 max pending I/Os due to the limit of one page being used
for mailbox entries - no ?

>>But to my surprise the case where the I/O latency is high, the interrupt
>>generation was quite small

If this patch results in an extra interrupt, it will very likely result in
reduction of latency for the next I/O. If the interrupt generation
increase is not high, then the number of I/Os whose latencies this patch
has improved is low. Looks like your workload belonged to this category.
Perhaps that's why you didn't much of an improvement in overall
performance ? I think this is close to the high I/O workload scenario I
described.

>>But where the I/O latency was very very small (4 microseconds) the
>>interrupt generation was on average about 20K/s.

This is not a scenario I tested but the results aren't surprising.  This
isn't the high I/O load I was describing though (I didn't test ramdisk).
SSD is probably the closest real world workload.
An increase of 20K/sec means this patch very likely improved latency of
20K I/Os per sec although the absolute value of latency improvements would
be smaller in this case. 20K/sec interrupt rate (50usec delay between
interrupt) is something I would be comfortable with if they directly
translate to latency improvement for the users. The graphs seem to
indicate a 5% increase in throughput for this case - Am I reading the
graphs right ? 

Overall, Very useful tests indeed and I haven't seen anything too
concerning or unexpected except that I don't think you have seen the 50+%
latency benefit that the patch got me in my moderate I/O benchmark :-)
Feel free to ping me offline if you aren't able to see the latency impact
using the 'moderate I/O' methodology described above.

About IRQ coalescing: Stepping back a bit, there are few different use
cases that irq coalescing mechanism would be useful for

1. Latency sensitive workload: Wait time of 10s of usecs. Particularly
useful for SSDs. 
2. Interrupt rate conscious workload/environment: Wait time of 200+ usecs
which will essentially cap the theoretical interrupt rate to 5K.
3. Excessive CPU consumption Mitigation: This is similar to (2) but
includes the case of malicious guests. Perhaps not a big concern unless
you have lots of drives attached to each guest.

I suspect the implementation for (1) and (2) would be different (spin vs
sleep perhaps). (3) can't be implemented by manipulation of 'req_event'
since a guest has the ability to abuse irq channel independent of what
'blkback' tries to tell 'blkfront' via 'req_event' manipulation.

(3) could be implemented in the hypervisor as a generic irq throttler that
could be leveraged for all irqs heading to Dom-0 from DomUs including
blkback/netback. Such a mechanism could potentially solve (1) and/or (2)
as well. Thoughts ?

One crude way to address (3) for 'many disk drive' scenario is to pin
all/most blkback interrupts for an instance to the same CPU core in Dom-0
and throttle down the thread wake up (wake_up(&blkif->wq) in
blkif_notify_work) that usually results in IPIs. Not an elegant solution
but might be a good crutch.

Another angle to (1) and (2) is whether these irq coalesce settings should
be controllable by the guest, perhaps within limits set by the
administrator. 

Thoughts ? Suggestions ?

Konrad, Love to help out if you are already working on something around
irq coalescing. Or when I have irq coalescing functionality that can be
consumed by community I will certainly submit them.

Meanwhile, I wouldn't want to deny Xen users the advantage of this patch
just because there is no irq coalescing functionality. Particularly since
the downside is very minimal on blkfront-blkback stack. My 2 cents..

Thanks much Konrad,

- Pradeep Vincent

On 5/16/11 8:22 AM, "Konrad Rzeszutek Wilk" <konrad.wilk@oracle.com> wrote:

>On Thu, May 12, 2011 at 10:51:32PM -0400, Konrad Rzeszutek Wilk wrote:
>> > >>what were the numbers when it came to high bandwidth numbers
>> > 
>> > Under high I/O workload, where the blkfront would fill up the queue as
>> > blkback works the queue, the I/O latency problem in question doesn't
>> > manifest itself and as a result this patch doesn't make much of a
>> > difference in terms of interrupt rate. My benchmarks didn't show any
>> > significant effect.
>> 
>> I have to rerun my benchmarks. Under high load (so 64Kb, four threads
>> writting as much as they can to a iSCSI disk), the IRQ rate for each
>> blkif went from 2-3/sec to ~5K/sec. But I did not do a good
>> job on capturing the submission latency to see if the I/Os get the
>> response back as fast (or the same) as without your patch.
>> 
>> And the iSCSI disk on the target side was an RAMdisk, so latency
>> was quite small which is not fair to your problem.
>> 
>> Do you have a program to measure the latency for the workload you
>> had encountered? I would like to run those numbers myself.
>
>Ran some more benchmarks over this week. This time I tried to run it on:
>
> - iSCSI target (1GB, and on the "other side" it wakes up every 1msec, so
>the
>   latency is set to 1msec).
> - scsi_debug delay=0 (no delay and as fast possible. Comes out to be
>about
>   4 microseconds completion with queue depth of one with 32K I/Os).
> - local SATAI 80GB ST3808110AS. Still running as it is quite slow.
>
>With only one PV guest doing a round (three times) of two threads randomly
>writting I/Os with a queue depth of 256. Then a different round of four
>threads writting/reading (80/20) 512bytes up to 64K randomly over the
>disk.
>
>I used the attached patch against #master
>(git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git)
>to gauge how well we are doing (and what the interrupt generation rate
>is).
>
>These workloads I think would be considered 'high I/O' and I was expecting
>your patch to not have any influence on the numbers.
>
>But to my surprise the case where the I/O latency is high, the interrupt
>generation
>was quite small. But where the I/O latency was very very small (4
>microseconds)
>the interrupt generation was on average about 20K/s. And this is with a
>queue depth
>of 256 with four threads. I was expecting the opposite. Hence quite
>curious
>to see your use case.
>
>What do you consider a middle I/O and low I/O cases? Do you use 'fio' for
>your
>testing?
>
>With the high I/O load, the numbers came out to give us about 1% benefit
>with your
>patch. However, I am worried (maybe unneccassarily?) about the 20K
>interrupt generation
>when the iometer tests kicked in (this was only when using the
>unrealistic 'scsi_debug'
>drive).
>
>The picture of this using iSCSI target:
>http://darnok.org/xen/amazon/iscsi_target/iometer-bw.png
>
>And when done on top of local RAMdisk:
>http://darnok.org/xen/amazon/scsi_debug/iometer-bw.png
>

next prev parent reply	other threads:[~2011-05-20  6:12 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-05-02  7:04 [PATCH] blkback: Fix block I/O latency issue Vincent, Pradeep
2011-05-02  8:13 ` Jan Beulich
2011-05-03  1:10   ` Vincent, Pradeep
2011-05-03 14:55     ` Konrad Rzeszutek Wilk
2011-05-03 17:16       ` Vincent, Pradeep
2011-05-03 17:51         ` Daniel Stodden
2011-05-03 23:41           ` Vincent, Pradeep
2011-05-03 17:52     ` Daniel Stodden
2011-05-04  1:54       ` Vincent, Pradeep
2011-05-09 20:24         ` Konrad Rzeszutek Wilk
2011-05-13  0:40           ` Vincent, Pradeep
2011-05-13  2:51             ` Konrad Rzeszutek Wilk
2011-05-16 15:22               ` Konrad Rzeszutek Wilk
2011-05-20  6:12                 ` Vincent, Pradeep [this message]
2011-05-24 16:02                   ` Konrad Rzeszutek Wilk
2011-05-24 22:40                     ` Vincent, Pradeep
2011-05-28 20:12 ` [RE-PATCH] " Daniel Stodden
2011-05-28 20:21   ` [PATCH] xen/blkback: Don't let in-flight requests defer pending ones Daniel Stodden
2011-05-29  8:09     ` Vincent, Pradeep
2011-05-29 11:34       ` Daniel Stodden
2011-06-01  8:02         ` Vincent, Pradeep
2011-06-01  8:24           ` Jan Beulich
2011-06-01 17:49           ` Daniel Stodden
2011-06-01 18:07             ` Daniel Stodden
2011-06-27 14:03             ` Konrad Rzeszutek Wilk
2011-06-27 18:42               ` Daniel Stodden
2011-06-27 19:13                 ` Konrad Rzeszutek Wilk
2011-06-28  0:31                   ` Daniel Stodden
2011-06-28 13:19                     ` Konrad Rzeszutek Wilk
2011-05-31 13:44       ` Fix wrong help message for parameter nestedhvm Dong, Eddie
2011-05-31 16:23         ` Ian Campbell
2011-05-31 16:08     ` [PATCH] xen/blkback: Don't let in-flight requests defer pending ones Konrad Rzeszutek Wilk
2011-05-31 16:30       ` Daniel Stodden

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=C9FAE626.161E7%pradeepv@amazon.com \
    --to=pradeepv@amazon.com \
    --cc=JBeulich@novell.com \
    --cc=daniel.stodden@citrix.com \
    --cc=jeremy@goop.org \
    --cc=konrad.wilk@oracle.com \
    --cc=xen-devel@lists.xensource.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).