From: "Vincent, Pradeep" <pradeepv@amazon.com>
To: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Daniel, Jeremy Fitzhardinge <jeremy@goop.org>,
"xen-devel@lists.xensource.com" <xen-devel@lists.xensource.com>,
Jan Beulich <JBeulich@novell.com>,
Stodden <daniel.stodden@citrix.com>
Subject: Re: [PATCH] blkback: Fix block I/O latency issue
Date: Thu, 19 May 2011 23:12:25 -0700 [thread overview]
Message-ID: <C9FAE626.161E7%pradeepv@amazon.com> (raw)
In-Reply-To: <20110516152224.GA7195@dumpdata.com>
Hey Konrad,
Thanks for running the tests. Very useful data.
Re: Experiment to show latency improvement
I never ran anything on ramdisk.
You should be able to see the latency benefit with 'orion' tool but I am
sure other tools can be used as well. For a volume backed by a single disk
drive, keep the number of small random I/O outstanding to 2 (I think
"num_small" parameter in orion should do the job) with a 50-50 mix of
write and read. Measure the latency reported by the guest and Dom-0 &
compare them. For LVM volumes that present multiple drives as a single LUN
(inside the guest), the latency improvement will be the highest when the
number of I/O outstanding is 2X the number of spindles. This is the
'moderate I/O' scenario I was describing and you should see significant
improvement in latencies.
If you allow page cache to perform sequential I/O using dd or other
sequential non-direct I/O generation tool, you should find that the
interrupt rate doesn't go up for high I/O load. Thinking about this, I
think burstiness of I/O submission as seen by the driver is also a key
player particularly in the absence of I/O coalescing waits introduced by
I/O scheduler. Page cache draining is notoriously bursty.
>>queue depth of 256.
What 'queue depth' is this ? If I am not wrong, blkfront-blkback is
restricted to ~32 max pending I/Os due to the limit of one page being used
for mailbox entries - no ?
>>But to my surprise the case where the I/O latency is high, the interrupt
>>generation was quite small
If this patch results in an extra interrupt, it will very likely result in
reduction of latency for the next I/O. If the interrupt generation
increase is not high, then the number of I/Os whose latencies this patch
has improved is low. Looks like your workload belonged to this category.
Perhaps that's why you didn't much of an improvement in overall
performance ? I think this is close to the high I/O workload scenario I
described.
>>But where the I/O latency was very very small (4 microseconds) the
>>interrupt generation was on average about 20K/s.
This is not a scenario I tested but the results aren't surprising. This
isn't the high I/O load I was describing though (I didn't test ramdisk).
SSD is probably the closest real world workload.
An increase of 20K/sec means this patch very likely improved latency of
20K I/Os per sec although the absolute value of latency improvements would
be smaller in this case. 20K/sec interrupt rate (50usec delay between
interrupt) is something I would be comfortable with if they directly
translate to latency improvement for the users. The graphs seem to
indicate a 5% increase in throughput for this case - Am I reading the
graphs right ?
Overall, Very useful tests indeed and I haven't seen anything too
concerning or unexpected except that I don't think you have seen the 50+%
latency benefit that the patch got me in my moderate I/O benchmark :-)
Feel free to ping me offline if you aren't able to see the latency impact
using the 'moderate I/O' methodology described above.
About IRQ coalescing: Stepping back a bit, there are few different use
cases that irq coalescing mechanism would be useful for
1. Latency sensitive workload: Wait time of 10s of usecs. Particularly
useful for SSDs.
2. Interrupt rate conscious workload/environment: Wait time of 200+ usecs
which will essentially cap the theoretical interrupt rate to 5K.
3. Excessive CPU consumption Mitigation: This is similar to (2) but
includes the case of malicious guests. Perhaps not a big concern unless
you have lots of drives attached to each guest.
I suspect the implementation for (1) and (2) would be different (spin vs
sleep perhaps). (3) can't be implemented by manipulation of 'req_event'
since a guest has the ability to abuse irq channel independent of what
'blkback' tries to tell 'blkfront' via 'req_event' manipulation.
(3) could be implemented in the hypervisor as a generic irq throttler that
could be leveraged for all irqs heading to Dom-0 from DomUs including
blkback/netback. Such a mechanism could potentially solve (1) and/or (2)
as well. Thoughts ?
One crude way to address (3) for 'many disk drive' scenario is to pin
all/most blkback interrupts for an instance to the same CPU core in Dom-0
and throttle down the thread wake up (wake_up(&blkif->wq) in
blkif_notify_work) that usually results in IPIs. Not an elegant solution
but might be a good crutch.
Another angle to (1) and (2) is whether these irq coalesce settings should
be controllable by the guest, perhaps within limits set by the
administrator.
Thoughts ? Suggestions ?
Konrad, Love to help out if you are already working on something around
irq coalescing. Or when I have irq coalescing functionality that can be
consumed by community I will certainly submit them.
Meanwhile, I wouldn't want to deny Xen users the advantage of this patch
just because there is no irq coalescing functionality. Particularly since
the downside is very minimal on blkfront-blkback stack. My 2 cents..
Thanks much Konrad,
- Pradeep Vincent
On 5/16/11 8:22 AM, "Konrad Rzeszutek Wilk" <konrad.wilk@oracle.com> wrote:
>On Thu, May 12, 2011 at 10:51:32PM -0400, Konrad Rzeszutek Wilk wrote:
>> > >>what were the numbers when it came to high bandwidth numbers
>> >
>> > Under high I/O workload, where the blkfront would fill up the queue as
>> > blkback works the queue, the I/O latency problem in question doesn't
>> > manifest itself and as a result this patch doesn't make much of a
>> > difference in terms of interrupt rate. My benchmarks didn't show any
>> > significant effect.
>>
>> I have to rerun my benchmarks. Under high load (so 64Kb, four threads
>> writting as much as they can to a iSCSI disk), the IRQ rate for each
>> blkif went from 2-3/sec to ~5K/sec. But I did not do a good
>> job on capturing the submission latency to see if the I/Os get the
>> response back as fast (or the same) as without your patch.
>>
>> And the iSCSI disk on the target side was an RAMdisk, so latency
>> was quite small which is not fair to your problem.
>>
>> Do you have a program to measure the latency for the workload you
>> had encountered? I would like to run those numbers myself.
>
>Ran some more benchmarks over this week. This time I tried to run it on:
>
> - iSCSI target (1GB, and on the "other side" it wakes up every 1msec, so
>the
> latency is set to 1msec).
> - scsi_debug delay=0 (no delay and as fast possible. Comes out to be
>about
> 4 microseconds completion with queue depth of one with 32K I/Os).
> - local SATAI 80GB ST3808110AS. Still running as it is quite slow.
>
>With only one PV guest doing a round (three times) of two threads randomly
>writting I/Os with a queue depth of 256. Then a different round of four
>threads writting/reading (80/20) 512bytes up to 64K randomly over the
>disk.
>
>I used the attached patch against #master
>(git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git)
>to gauge how well we are doing (and what the interrupt generation rate
>is).
>
>These workloads I think would be considered 'high I/O' and I was expecting
>your patch to not have any influence on the numbers.
>
>But to my surprise the case where the I/O latency is high, the interrupt
>generation
>was quite small. But where the I/O latency was very very small (4
>microseconds)
>the interrupt generation was on average about 20K/s. And this is with a
>queue depth
>of 256 with four threads. I was expecting the opposite. Hence quite
>curious
>to see your use case.
>
>What do you consider a middle I/O and low I/O cases? Do you use 'fio' for
>your
>testing?
>
>With the high I/O load, the numbers came out to give us about 1% benefit
>with your
>patch. However, I am worried (maybe unneccassarily?) about the 20K
>interrupt generation
>when the iometer tests kicked in (this was only when using the
>unrealistic 'scsi_debug'
>drive).
>
>The picture of this using iSCSI target:
>http://darnok.org/xen/amazon/iscsi_target/iometer-bw.png
>
>And when done on top of local RAMdisk:
>http://darnok.org/xen/amazon/scsi_debug/iometer-bw.png
>
next prev parent reply other threads:[~2011-05-20 6:12 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-05-02 7:04 [PATCH] blkback: Fix block I/O latency issue Vincent, Pradeep
2011-05-02 8:13 ` Jan Beulich
2011-05-03 1:10 ` Vincent, Pradeep
2011-05-03 14:55 ` Konrad Rzeszutek Wilk
2011-05-03 17:16 ` Vincent, Pradeep
2011-05-03 17:51 ` Daniel Stodden
2011-05-03 23:41 ` Vincent, Pradeep
2011-05-03 17:52 ` Daniel Stodden
2011-05-04 1:54 ` Vincent, Pradeep
2011-05-09 20:24 ` Konrad Rzeszutek Wilk
2011-05-13 0:40 ` Vincent, Pradeep
2011-05-13 2:51 ` Konrad Rzeszutek Wilk
2011-05-16 15:22 ` Konrad Rzeszutek Wilk
2011-05-20 6:12 ` Vincent, Pradeep [this message]
2011-05-24 16:02 ` Konrad Rzeszutek Wilk
2011-05-24 22:40 ` Vincent, Pradeep
2011-05-28 20:12 ` [RE-PATCH] " Daniel Stodden
2011-05-28 20:21 ` [PATCH] xen/blkback: Don't let in-flight requests defer pending ones Daniel Stodden
2011-05-29 8:09 ` Vincent, Pradeep
2011-05-29 11:34 ` Daniel Stodden
2011-06-01 8:02 ` Vincent, Pradeep
2011-06-01 8:24 ` Jan Beulich
2011-06-01 17:49 ` Daniel Stodden
2011-06-01 18:07 ` Daniel Stodden
2011-06-27 14:03 ` Konrad Rzeszutek Wilk
2011-06-27 18:42 ` Daniel Stodden
2011-06-27 19:13 ` Konrad Rzeszutek Wilk
2011-06-28 0:31 ` Daniel Stodden
2011-06-28 13:19 ` Konrad Rzeszutek Wilk
2011-05-31 13:44 ` Fix wrong help message for parameter nestedhvm Dong, Eddie
2011-05-31 16:23 ` Ian Campbell
2011-05-31 16:08 ` [PATCH] xen/blkback: Don't let in-flight requests defer pending ones Konrad Rzeszutek Wilk
2011-05-31 16:30 ` Daniel Stodden
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=C9FAE626.161E7%pradeepv@amazon.com \
--to=pradeepv@amazon.com \
--cc=JBeulich@novell.com \
--cc=daniel.stodden@citrix.com \
--cc=jeremy@goop.org \
--cc=konrad.wilk@oracle.com \
--cc=xen-devel@lists.xensource.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).