* Re: Re: poor domU VBD performance.
2005-03-28 23:38 ` Peter Bier
@ 2005-03-29 0:27 ` Andrew Theurer
0 siblings, 0 replies; 41+ messages in thread
From: Andrew Theurer @ 2005-03-29 0:27 UTC (permalink / raw)
To: Peter Bier, xen-devel
> My dd command was always the same: "dd if=/dev/hdb6 bs=64k count=1000" and
> it took 1.6 seconds on hdb6 and 2.2 seconds on hda1 when running in Dom0
> and it took 4.6 seconds on hdb6 and 5.8 seconds on hda1 when running on
> DomU. I did one experiment with count=10000 and it took ten times as long
> in each of the four cases.
>
> I have done the following tests:
> DomU : dd if=/dev/hdb6 of=/dev/null bs=1024k count=4000 ; duration 301 sec
> DomU : dd if=/dev/hdb6 of=/dev/null bs=1024k count=4000 ; duration 370 sec
>
> Dom0 : dd if=/dev/hdb6 of=/dev/null bs=1024k count=4000 ; duration 115 sec
> Dom0 : dd if=/dev/hda1 of=/dev/null bs=1024k count=4000 ; duration 140 sec
OK, I have produced this with both dd and o-direct now. On o-direct, I needed
what was the effective dd block request size (128k) and I got similar
results. My results are much worse, due to that I am driving 14 disks:
dom0: 153.5 MB/sec
domU: 12.7 MB/sec
It looks like there might be a problem were we are not getting a timely
response back from dom0 VBD driver that the io request is complete, which
limits the number of outstanding requests to a level which cannot keep the
disk utilized well. If you drive enough IO outstanding requests (which can
be done with either o-direct with large request or a much larger readahead
setting with buffered IO), it's not an issue.
In the domU, can you try setting the readahead size to a much larger value
using hdparm? Something like hdparm -a 2028, then run dd?
-Andrew
^ permalink raw reply [flat|nested] 41+ messages in thread
* RE: Re: poor domU VBD performance.
@ 2005-03-29 8:13 Ian Pratt
2005-03-29 18:39 ` Andrew Theurer
0 siblings, 1 reply; 41+ messages in thread
From: Ian Pratt @ 2005-03-29 8:13 UTC (permalink / raw)
To: Andrew Theurer, Peter Bier, xen-devel
> It looks like there might be a problem were we are not
> getting a timely
> response back from dom0 VBD driver that the io request is
> complete, which
> limits the number of outstanding requests to a level which
> cannot keep the
> disk utilized well. If you drive enough IO outstanding
> requests (which can
> be done with either o-direct with large request or a much
> larger readahead
> setting with buffered IO), it's not an issue.
Andrew, please could you try this with a 2.4 dom0, 2.6 domU.
Thanks,
Ian
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: RE: poor domU VBD performance.
2005-03-28 20:14 Ian Pratt
2005-03-28 21:48 ` Andrew Theurer
@ 2005-03-29 13:34 ` Kurt Garloff
1 sibling, 0 replies; 41+ messages in thread
From: Kurt Garloff @ 2005-03-29 13:34 UTC (permalink / raw)
To: Ian Pratt; +Cc: Peter Bier, Andrew Theurer, xen-devel
[-- Attachment #1.1: Type: text/plain, Size: 678 bytes --]
Hi,
On Mon, Mar 28, 2005 at 09:14:10PM +0100, Ian Pratt wrote:
> > Is the second disk exactly the same as the first one? I'll
> > try an IO test
> > here on the same disk array with dom0 and domU and see what I get.
>
> I've reproduced the problem and its a real issue.
>
> It only affects reads, and is almost certainly down to how the blkback
> driver passes requests down to the actual device.
Two points to look at:
* Block size (filesystems set this to 4k normally, default it 1k)
* Read ahead (you need to do it, otherwise you end up doing tiny
requests). You can tune it in sysfs.
Regards,
--
Kurt Garloff, Director SUSE Labs, Novell Inc.
[-- Attachment #1.2: Type: application/pgp-signature, Size: 189 bytes --]
[-- Attachment #2: Type: text/plain, Size: 138 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 41+ messages in thread
* RE: Re: poor domU VBD performance.
@ 2005-03-29 13:38 Ian Pratt
2005-03-29 14:28 ` B.G. Bruce
0 siblings, 1 reply; 41+ messages in thread
From: Ian Pratt @ 2005-03-29 13:38 UTC (permalink / raw)
To: peter bier, xen-devel
> "dd if=/dev/hde1 of=/dev/null bs=1024k count=1024"
>
> in domU.
>
> hdparm told that the default setup was 256k readahead.
Do you mean KB or sectors?
> I have tested the performance with the following readahead settings:
>
> readahead | duration
> 128 sectors | 160 sec
> 256 sectors | 76 sec
> 512 sectors | 18.5 sec
> 1024 sectors | 19.5 sec
> 1200 sectors | 457 sec
> dom0 takes 18.0 secs no matter of the readahead setting in Dom0 is.
Would you mind repeating these experiments with a 2.4 dom0 and a 2.6domU
?
The performance cliff below 512 and above 1024 sectors is spectacular.
This is all rather confusing, but at least we know it can be made to
work fast. Changing the domU readahead is unlikely to be the right fix.
We just need to figure out how to keep it on the sweet spot...
Thanks,
Ian
^ permalink raw reply [flat|nested] 41+ messages in thread
* RE: Re: poor domU VBD performance.
@ 2005-03-29 14:19 Ian Pratt
0 siblings, 0 replies; 41+ messages in thread
From: Ian Pratt @ 2005-03-29 14:19 UTC (permalink / raw)
To: Ian Pratt, peter bier, xen-devel
> Would you mind repeating these experiments with a 2.4 dom0
> and a 2.6domU
> ?
Also, please could you try exporting the device to the dom0 as a scsi
device e.g. sda1 rather than ide device hde1 or hda1. [Yes, I know this
shouldn't make any difference, but I have a suspicion it will.]
Thanks,
Ian
^ permalink raw reply [flat|nested] 41+ messages in thread
* RE: Re: poor domU VBD performance.
2005-03-29 13:38 Ian Pratt
@ 2005-03-29 14:28 ` B.G. Bruce
0 siblings, 0 replies; 41+ messages in thread
From: B.G. Bruce @ 2005-03-29 14:28 UTC (permalink / raw)
To: Ian Pratt; +Cc: peter bier, xen-devel
Has anyone looked into using the other schedulers? Potentially noop or
deadline for Dom0 with deadline or anticipatory for DomUs , or go the
other way - noop/deadline in DomUs and cfq/deadline in Dom0? It
actually make some sense that DomU performance would be degraded from
Dom0's as the request has to go through it's scheduler (which typically
was designed to run async with some "fairness queuing") which is limited
by XEN's bvt scheduler and then Dom0's disk i/o scheduler which is also
limited by XEN's bvt scheduler (doesn't it?) Enabling/Disabling
Preemptible Kernel may also provide some light on the situation.
If this becomes an item open to modification for performance reasons,
I'd prefer to have Dom0 set the performance of the DomU's. It wouldn't
really matter for the moment, but once DomU's get to boot their own
kernel (as in hosting services providing xen'd servers/services where
the client can compile their own kernel - which has been talked about -
this will become a requirement/feature request).
I was actually going to do some testing in these areas, but my test box
(AMD 3000+/water cooled) overheated and fried the northbridge/memory.
Oh, the joys of living in the tropics ;-) A new MB (upgraded to AMD64)
should arrive end of the week or early next week so I can test then if
no one else get around to it.
Regards,
Brian.
On Tue, 2005-03-29 at 09:38, Ian Pratt wrote:
> > "dd if=/dev/hde1 of=/dev/null bs=1024k count=1024"
> >
> > in domU.
> >
> > hdparm told that the default setup was 256k readahead.
>
> Do you mean KB or sectors?
>
> > I have tested the performance with the following readahead settings:
> >
> > readahead | duration
> > 128 sectors | 160 sec
> > 256 sectors | 76 sec
> > 512 sectors | 18.5 sec
> > 1024 sectors | 19.5 sec
> > 1200 sectors | 457 sec
> > dom0 takes 18.0 secs no matter of the readahead setting in Dom0 is.
>
> Would you mind repeating these experiments with a 2.4 dom0 and a 2.6domU
> ?
>
> The performance cliff below 512 and above 1024 sectors is spectacular.
> This is all rather confusing, but at least we know it can be made to
> work fast. Changing the domU readahead is unlikely to be the right fix.
> We just need to figure out how to keep it on the sweet spot...
> Thanks,
> Ian
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Re: poor domU VBD performance.
2005-03-29 8:13 Ian Pratt
@ 2005-03-29 18:39 ` Andrew Theurer
2005-03-29 19:13 ` Steven Hand
0 siblings, 1 reply; 41+ messages in thread
From: Andrew Theurer @ 2005-03-29 18:39 UTC (permalink / raw)
To: Ian Pratt, Peter Bier, xen-devel
[-- Attachment #1: Type: text/plain, Size: 1278 bytes --]
On Tuesday 29 March 2005 02:13, Ian Pratt wrote:
> > It looks like there might be a problem were we are not
> > getting a timely
> > response back from dom0 VBD driver that the io request is
> > complete, which
> > limits the number of outstanding requests to a level which
> > cannot keep the
> > disk utilized well. If you drive enough IO outstanding
> > requests (which can
> > be done with either o-direct with large request or a much
> > larger readahead
> > setting with buffered IO), it's not an issue.
>
> Andrew, please could you try this with a 2.4 dom0, 2.6 domU.
2.4 might be a little while for me, as I an running Fedora core3 with udev.
If anyone has any easy way to get around the hotplug/udev stuff, then I can
do this.
I did run a sequential read on a single disk again (using noop IO schedulers
in both domains) with various request sizes with o_direct while capturing
iostsat output. The results are interesting. I have included the data in a
file because it would just line wrap an be unreadable in this email text.
Notice the service commit times for domU tests. It's like the IO request
queue is being plugged for a minimum of 10ms in dom0. Merges happening for
>4K requests in dom0 (while hosting domU's IO) seem to support this.
-Andrew
[-- Attachment #2: rawio-comp --]
[-- Type: text/plain, Size: 1798 bytes --]
read from /dev/sdc (qlogic FC, 36GB 15k disk) with O_DRIECT enabled with iostat output
iostat -x 5
-----------------------------------------------------------------------------------------------------
cfg reqsize rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util
--- ---- ----
xenU 4k 0.00 0.00 100.00 0.00 800.00 0.00 400.00 0.00 8.00 1.00 10.00 10.00 100.00 <-from dom0
0.00 0.00 100.00 0.00 800.00 0.00 400.00 0.00 8.00 1.00 10.00 10.00 100.00 <-from domU
xenU 40k 900.00 0.00 100.00 0.00 8000.00 0.00 4000.00 0.00 80.00 1.00 10.00 10.00 100.00 <-from dom0
0.00 0.00 98.60 0.00 7888.00 0.00 3944.00 0.00 80.00 1.00 10.14 10.14 100.00 <-from domU
xenU 400k 9900.00 0.00 100.00 0.00 80000.00 0.00 40000.00 0.00 800.00 1.00 10.00 10.00 100.00 <-from dom0
0.00 0.00 1000.00 0.00 80000.00 0.00 40000.00 0.00 80.00 10.00 10.00 1.00 100.00 <-from domU
xenU 4000k 14512.60 0.00 146.40 0.00 117273.60 0.00 58636.80 0.00 801.05 4.57 31.16 6.83 100.00 <-from dom0
0.00 0.00 1328.60 0.00 116800.00 0.00 58400.00 0.00 87.91 50.69 38.15 0.75 99.80 <-from domU
xen0 4k 0.00 0.00 5883.60 0.00 47068.80 0.00 23534.40 0.00 8.00 0.99 0.17 0.17 98.60
xen0 40k 0.00 0.00 1452.20 0.00 116160.00 0.00 58080.00 0.00 79.99 1.00 0.69 0.69 99.60
xen0 400k 0.00 0.00 145.00 0.00 116000.00 0.00 58000.00 0.00 800.00 1.00 6.88 6.88 99.80
xen0 4000k 0.00 0.00 116.40 0.00 115200.00 0.00 57600.00 0.00 989.69 4.54 38.93 8.59 100.00
[-- Attachment #3: Type: text/plain, Size: 138 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Re: poor domU VBD performance.
2005-03-29 18:39 ` Andrew Theurer
@ 2005-03-29 19:13 ` Steven Hand
0 siblings, 0 replies; 41+ messages in thread
From: Steven Hand @ 2005-03-29 19:13 UTC (permalink / raw)
To: Andrew Theurer; +Cc: Ian Pratt, xen-devel, Peter Bier
> On Tuesday 29 March 2005 02:13, Ian Pratt wrote:
> > > It looks like there might be a problem were we are not
> > > getting a timely
> > > response back from dom0 VBD driver that the io request is
> > > complete, which
> > > limits the number of outstanding requests to a level which
> > > cannot keep the
> > > disk utilized well. If you drive enough IO outstanding
> > > requests (which can
> > > be done with either o-direct with large request or a much
> > > larger readahead
> > > setting with buffered IO), it's not an issue.
> >
> > Andrew, please could you try this with a 2.4 dom0, 2.6 domU.
>
> 2.4 might be a little while for me, as I an running Fedora core3 with udev.
> If anyone has any easy way to get around the hotplug/udev stuff, then I can
> do this.
You can run a populated /dev "underneath" the udev stuff quite happily;
e.g. if you boot into FC3 w/ udev do:
cd /dev/
tar zcpf /root/foo.tgz .
If you can boot from a rescue CD or sim, just mount your FC3
partition and untar the device nodes.
Works just fine.
> I did run a sequential read on a single disk again (using noop IO schedulers
> in both domains) with various request sizes with o_direct while capturing
> iostsat output. The results are interesting. I have included the data in a
> file because it would just line wrap an be unreadable in this email text.
> Notice the service commit times for domU tests. It's like the IO request
> queue is being plugged for a minimum of 10ms in dom0. Merges happening for
> >4K requests in dom0 (while hosting domU's IO) seem to support this.
[snip]
Ah - thanks for this -- will take a detailed look shortly.
cheers,
S.
^ permalink raw reply [flat|nested] 41+ messages in thread
* RE: RE: RE: poor domU VBD performance.
@ 2005-03-30 11:16 Ian Pratt
2005-03-30 17:01 ` peter bier
2005-03-31 7:05 ` RE: " Jens Axboe
0 siblings, 2 replies; 41+ messages in thread
From: Ian Pratt @ 2005-03-30 11:16 UTC (permalink / raw)
To: Jens Axboe, Kurt Garloff
Cc: Vincent Hanquez, Xen development list, Christian Limpach
> I'll check the xen block driver to see if there's anything
> else that sticks out.
>
> Jens Axboe
Jens, I'd really appreciate this.
The blkfront/blkback drivers have rather evolved over time, and I don't
think any of the core team fully understand the block-layer differences
between 2.4 and 2.6.
There's also some junk left in there from when the backend was in Xen
itself back in the days of 1.2, though Vincent has prepared a patch to
clean this up and also make 'refreshing' of vbd's work (for size
changes), and also allow the blkfront driver to import whole disks
rather than paritions. We had this functionality on 2.4, but lost it in
the move to 2.6.
My bet is that it's the 2.6 backend that is where the true perofrmance
bug lies. Using a 2.6 domU blkfront talking to a 2.4 dom0 blkback seems
to give good performance under a wide variety of circumstances. Using a
2.6 dom0 is far more pernickety. I agree with Andrew that I suspect it's
the work queue changes are biting us when we don't have many outstanding
requests.
Thanks,
Ian
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: poor domU VBD performance.
2005-03-30 11:16 RE: RE: poor domU VBD performance Ian Pratt
@ 2005-03-30 17:01 ` peter bier
2005-03-30 18:05 ` Andrew Theurer
2005-03-31 7:05 ` RE: " Jens Axboe
1 sibling, 1 reply; 41+ messages in thread
From: peter bier @ 2005-03-30 17:01 UTC (permalink / raw)
To: xen-devel
Ian Pratt <m+Ian.Pratt <at> cl.cam.ac.uk> writes:
>
> > I'll check the xen block driver to see if there's anything
> > else that sticks out.
> >
> > Jens Axboe
>
> Jens, I'd really appreciate this.
>
> The blkfront/blkback drivers have rather evolved over time, and I don't
> think any of the core team fully understand the block-layer differences
> between 2.4 and 2.6.
>
> There's also some junk left in there from when the backend was in Xen
> itself back in the days of 1.2, though Vincent has prepared a patch to
> clean this up and also make 'refreshing' of vbd's work (for size
> changes), and also allow the blkfront driver to import whole disks
> rather than paritions. We had this functionality on 2.4, but lost it in
> the move to 2.6.
>
> My bet is that it's the 2.6 backend that is where the true perofrmance
> bug lies. Using a 2.6 domU blkfront talking to a 2.4 dom0 blkback seems
> to give good performance under a wide variety of circumstances. Using a
> 2.6 dom0 is far more pernickety. I agree with Andrew that I suspect it's
> the work queue changes are biting us when we don't have many outstanding
> requests.
>
> Thanks,
> Ian
>
I have done my simple dd on hde1 with two different setting of readahead:
256 sectors and 512 sectors.
These are the results:
DOM0 readahead 512s
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s avgrq-
sz avgqu-sz await svctm %util
hde 115055.40 2.00 592.40 0.80 115647.80 22.40 57823.90 11.20
194.99 2.30 3.88 1.68 99.80
hda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00
avg-cpu: %user %nice %system %iowait %idle
0.20 0.00 31.60 14.20 54.00
DOMU readahead 512s
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s avgrq-
sz avgqu-sz await svctm %util
hda1 0.00 0.20 0.00 0.00 0.00 3.20 0.00 1.60
0.00 0.00 0.00 0.00 0.00
hde1 102301.40 0.00 11571.00 0.00 113868.80 0.00 56934.40
0.00 9.84 68.45 5.92 0.09 100.00
avg-cpu: %user %nice %system %iowait %idle
0.00 0.00 35.00 65.00 0.00
DOM0 readahead 256s
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s avgrq-
sz avgqu-sz await svctm %util
hde 28289.20 1.80 126.80 0.40 28416.00 17.60 14208.00 8.80
223.53 1.06 8.32 7.85 99.80
hda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00
avg-cpu: %user %nice %system %iowait %idle
0.20 0.00 1.60 5.60 92.60
DOMU readahead 256s
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s avgrq-
sz avgqu-sz await svctm %util
hda1 0.00 0.20 0.00 0.40 0.00 4.80 0.00 2.40
12.00 0.00 0.00 0.00 0.00
hde1 25085.60 0.00 3330.40 0.00 28416.00 0.00 14208.00
0.00 8.53 30.54 9.17 0.30 100.00
avg-cpu: %user %nice %system %iowait %idle
0.20 0.00 1.40 98.40 0.00
What surprises me is that the service time for the request in DOM0 decreases
dramatically when readahead is increased from 256 to 512 sectors. If the output
of iostat is reliable, it tells me requests in DOMU are assembled to about 8
to 10 sectors in size, while DOM0 puts them together to about 200 or even more
sectors
Using readahead of 256 sectors results in a an average queuesize of anout 1
while changing readahead to 512 sectors results in an avaerage queuesize of
slightly above 2 on DOM0. Service times in DOM0 and readahead 256 sectors
seem to be in the range of the typical seek time of a modern ide disk while
it is significantly lower with readahead of 512 sectors.
As I have mentioned, this is the system with only one installed disk; this re-
sults in the write activity on the disk. The two write request per second
go into a different partition and those result in four required seeks per
second. This should not be a reason for all requests to take about seek time
as service time.
I have done a number of further test on various systems. In most cases I failed
to achieve service times below 8 msecs in Dom0; the only counterexample is
reported above. It seems to me, that at low readahead values the amount of
data requested for from disk is simply the readahead amount of data. This
request takes about seek time and thus I get lower performance when I work
with small readahead values.
What I do not understand at all is why throughput collapses with large
readahead
sizes.
I found in mm/readahead.c that the readahead size for a file is updated if
the readahead is not efficient. I suspect that the mechanism might lead to
readahed being switched of for this file.
With readahead being set to 2048 sectors, the product of avgq-sz and avgrq-sz
reported by drops to 4 to 5 physical pages.
Peter
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Re: poor domU VBD performance.
2005-03-30 17:01 ` peter bier
@ 2005-03-30 18:05 ` Andrew Theurer
0 siblings, 0 replies; 41+ messages in thread
From: Andrew Theurer @ 2005-03-30 18:05 UTC (permalink / raw)
To: peter bier; +Cc: xen-devel
peter bier wrote:
>Ian Pratt <m+Ian.Pratt <at> cl.cam.ac.uk> writes:
>
>
>
>>>I'll check the xen block driver to see if there's anything
>>>else that sticks out.
>>>
>>>Jens Axboe
>>>
>>>
>>Jens, I'd really appreciate this.
>>
>>The blkfront/blkback drivers have rather evolved over time, and I don't
>>think any of the core team fully understand the block-layer differences
>>between 2.4 and 2.6.
>>
>>There's also some junk left in there from when the backend was in Xen
>>itself back in the days of 1.2, though Vincent has prepared a patch to
>>clean this up and also make 'refreshing' of vbd's work (for size
>>changes), and also allow the blkfront driver to import whole disks
>>rather than paritions. We had this functionality on 2.4, but lost it in
>>the move to 2.6.
>>
>>My bet is that it's the 2.6 backend that is where the true perofrmance
>>bug lies. Using a 2.6 domU blkfront talking to a 2.4 dom0 blkback seems
>>to give good performance under a wide variety of circumstances. Using a
>>2.6 dom0 is far more pernickety. I agree with Andrew that I suspect it's
>>the work queue changes are biting us when we don't have many outstanding
>>requests.
>>
>>Thanks,
>>Ian
>>
>>
>>
>
>
>I have done my simple dd on hde1 with two different setting of readahead:
>256 sectors and 512 sectors.
>
I added a counter and incremented every time blkback daemon was woken up
and ran the read test in domU. With 32k and 320k request sizes
(o_direct), I consistently got 200 wake ups/second. I expected
100/second, the same interval as the minimum svc cmt times I am seeing,
but anyway, 200/sec is way to low for small request sizes. I think this
confirms the latency issue. Not sure yet why it cannot wake up more
frequently.
-Andrew
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: RE: RE: poor domU VBD performance.
2005-03-30 11:16 RE: RE: poor domU VBD performance Ian Pratt
2005-03-30 17:01 ` peter bier
@ 2005-03-31 7:05 ` Jens Axboe
2005-03-31 7:10 ` Jens Axboe
1 sibling, 1 reply; 41+ messages in thread
From: Jens Axboe @ 2005-03-31 7:05 UTC (permalink / raw)
To: Ian Pratt
Cc: Xen development list, Kurt Garloff, Vincent Hanquez,
Christian Limpach
On Wed, Mar 30 2005, Ian Pratt wrote:
> > I'll check the xen block driver to see if there's anything
> > else that sticks out.
> >
> > Jens Axboe
>
> Jens, I'd really appreciate this.
>
> The blkfront/blkback drivers have rather evolved over time, and I don't
> think any of the core team fully understand the block-layer differences
> between 2.4 and 2.6.
>
> There's also some junk left in there from when the backend was in Xen
> itself back in the days of 1.2, though Vincent has prepared a patch to
> clean this up and also make 'refreshing' of vbd's work (for size
> changes), and also allow the blkfront driver to import whole disks
> rather than paritions. We had this functionality on 2.4, but lost it in
> the move to 2.6.
>
> My bet is that it's the 2.6 backend that is where the true perofrmance
> bug lies. Using a 2.6 domU blkfront talking to a 2.4 dom0 blkback seems
> to give good performance under a wide variety of circumstances. Using a
> 2.6 dom0 is far more pernickety. I agree with Andrew that I suspect it's
> the work queue changes are biting us when we don't have many outstanding
> requests.
You never schedule the queues you submit the io against for the 2.6
kernel, you only have a tq_disk run for 2.4 kernels. This basically puts
you at the mercy of the timeout unplugging, which is really suboptimal
unless you can keep the io queue of the target busy at all times.
You need to either mark the last bio going to that device as BIO_SYNC,
or do a blk_run_queue() on the target queue after having submitted all
io in this batch for it.
--
Jens Axboe
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: RE: RE: poor domU VBD performance.
2005-03-31 7:05 ` RE: " Jens Axboe
@ 2005-03-31 7:10 ` Jens Axboe
2005-03-31 8:17 ` Keir Fraser
0 siblings, 1 reply; 41+ messages in thread
From: Jens Axboe @ 2005-03-31 7:10 UTC (permalink / raw)
To: Ian Pratt
Cc: Xen development list, Kurt Garloff, Vincent Hanquez,
Christian Limpach
On Thu, Mar 31 2005, Jens Axboe wrote:
> On Wed, Mar 30 2005, Ian Pratt wrote:
> > > I'll check the xen block driver to see if there's anything
> > > else that sticks out.
> > >
> > > Jens Axboe
> >
> > Jens, I'd really appreciate this.
> >
> > The blkfront/blkback drivers have rather evolved over time, and I don't
> > think any of the core team fully understand the block-layer differences
> > between 2.4 and 2.6.
> >
> > There's also some junk left in there from when the backend was in Xen
> > itself back in the days of 1.2, though Vincent has prepared a patch to
> > clean this up and also make 'refreshing' of vbd's work (for size
> > changes), and also allow the blkfront driver to import whole disks
> > rather than paritions. We had this functionality on 2.4, but lost it in
> > the move to 2.6.
> >
> > My bet is that it's the 2.6 backend that is where the true perofrmance
> > bug lies. Using a 2.6 domU blkfront talking to a 2.4 dom0 blkback seems
> > to give good performance under a wide variety of circumstances. Using a
> > 2.6 dom0 is far more pernickety. I agree with Andrew that I suspect it's
> > the work queue changes are biting us when we don't have many outstanding
> > requests.
>
> You never schedule the queues you submit the io against for the 2.6
> kernel, you only have a tq_disk run for 2.4 kernels. This basically puts
> you at the mercy of the timeout unplugging, which is really suboptimal
> unless you can keep the io queue of the target busy at all times.
>
> You need to either mark the last bio going to that device as BIO_SYNC,
> or do a blk_run_queue() on the target queue after having submitted all
> io in this batch for it.
Here is a temporary work-around, this should bring you close to 100%
performance at the cost of some extra unplugs. Uncompiled.
--- blkback.c~ 2005-03-31 09:06:16.000000000 +0200
+++ blkback.c 2005-03-31 09:09:27.000000000 +0200
@@ -481,7 +481,6 @@
for ( i = 0; i < nr_psegs; i++ )
{
struct bio *bio;
- struct bio_vec *bv;
bio = bio_alloc(GFP_ATOMIC, 1);
if ( unlikely(bio == NULL) )
@@ -494,17 +493,12 @@
bio->bi_private = pending_req;
bio->bi_end_io = end_block_io_op;
bio->bi_sector = phys_seg[i].sector_number;
- bio->bi_rw = operation;
- bv = bio_iovec_idx(bio, 0);
- bv->bv_page = virt_to_page(MMAP_VADDR(pending_idx, i));
- bv->bv_len = phys_seg[i].nr_sects << 9;
- bv->bv_offset = phys_seg[i].buffer & ~PAGE_MASK;
+ bio_add_page(bio, virt_to_page(MMAP_VADDR(pending_idx, i)),
+ phys_seg[i].nr_sects << 9,
+ phys_seg[i].buffer & ~PAGE_MASK);
- bio->bi_size = bv->bv_len;
- bio->bi_vcnt++;
-
- submit_bio(operation, bio);
+ submit_bio(operation | (1 << BIO_RW_SYNC), bio);
}
#endif
--
Jens Axboe
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: poor domU VBD performance.
2005-03-31 7:10 ` Jens Axboe
@ 2005-03-31 8:17 ` Keir Fraser
2005-03-31 8:19 ` Jens Axboe
0 siblings, 1 reply; 41+ messages in thread
From: Keir Fraser @ 2005-03-31 8:17 UTC (permalink / raw)
To: Jens Axboe
Cc: Ian Pratt, Xen development list, Kurt Garloff, Vincent Hanquez,
Christian Limpach
On 31 Mar 2005, at 08:10, Jens Axboe wrote:
> Here is a temporary work-around, this should bring you close to 100%
> performance at the cost of some extra unplugs. Uncompiled.
Yep, this does the job for me. Thanks! Avoiding the extra unplugs is
harder than it sounds as each request in a batch may go to a different
request queue. To minimise the number of unplugs per batch we'd need to
add code to remember which queues we had used in the current batch,
then kick them at the end of the batch. Is there likely to be any
measurable benefit from reducing the number of unplugs?
-- Keir
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: poor domU VBD performance.
2005-03-31 8:17 ` Keir Fraser
@ 2005-03-31 8:19 ` Jens Axboe
2005-03-31 14:33 ` Philip R Auld
0 siblings, 1 reply; 41+ messages in thread
From: Jens Axboe @ 2005-03-31 8:19 UTC (permalink / raw)
To: Keir Fraser
Cc: Ian Pratt, Xen development list, Kurt Garloff, Vincent Hanquez,
Christian Limpach
On Thu, Mar 31 2005, Keir Fraser wrote:
>
> On 31 Mar 2005, at 08:10, Jens Axboe wrote:
>
> >Here is a temporary work-around, this should bring you close to 100%
> >performance at the cost of some extra unplugs. Uncompiled.
>
> Yep, this does the job for me. Thanks! Avoiding the extra unplugs is
> harder than it sounds as each request in a batch may go to a different
> request queue. To minimise the number of unplugs per batch we'd need to
> add code to remember which queues we had used in the current batch,
> then kick them at the end of the batch. Is there likely to be any
Or just keep track of the previous queue, if that has changed unplug the
previous queue and update previous queue variable.
> measurable benefit from reducing the number of unplugs?
Probably not, since the plugging happened at the front end as well. So
you should get a nice stream of io in any way.
--
Jens Axboe
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: poor domU VBD performance.
2005-03-31 8:19 ` Jens Axboe
@ 2005-03-31 14:33 ` Philip R Auld
2005-03-31 15:34 ` Kurt Garloff
0 siblings, 1 reply; 41+ messages in thread
From: Philip R Auld @ 2005-03-31 14:33 UTC (permalink / raw)
To: Jens Axboe
Cc: Ian Pratt, Xen development list, Kurt Garloff, Vincent Hanquez,
Christian Limpach
Rumor has it that on Thu, Mar 31, 2005 at 10:19:01AM +0200 Jens Axboe said:
> On Thu, Mar 31 2005, Keir Fraser wrote:
>
> > measurable benefit from reducing the number of unplugs?
>
> Probably not, since the plugging happened at the front end as well. So
> you should get a nice stream of io in any way.
This effects merging though, right? I don't think the the front
end has done any merging.
Also the BIO_RW_SYNC bit is sometimes ignored in __make_request
due to the bad queue locking interactions with scsi_request_fn.
The bio can be completed before the bio_sync() test in
__make_request. Since there is no other reference to the bio it
can be freed and reused by the time it is tested for BIO_RW_SYNC.
Cheers,
Phil
>
> --
> Jens Axboe
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
--
Philip R. Auld, Ph.D. Egenera, Inc.
Software Architect 165 Forest St.
(508) 858-2628 Marlboro, MA 01752
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: poor domU VBD performance.
2005-03-31 14:33 ` Philip R Auld
@ 2005-03-31 15:34 ` Kurt Garloff
2005-03-31 15:39 ` Jens Axboe
2005-03-31 16:53 ` Philip R Auld
0 siblings, 2 replies; 41+ messages in thread
From: Kurt Garloff @ 2005-03-31 15:34 UTC (permalink / raw)
To: Philip R Auld
Cc: Ian Pratt, Xen development list, Vincent Hanquez, Jens Axboe,
Christian Limpach
[-- Attachment #1.1: Type: text/plain, Size: 412 bytes --]
Hi,
On Thu, Mar 31, 2005 at 09:33:12AM -0500, Philip R Auld wrote:
> This effects merging though, right? I don't think the the front
> end has done any merging.
The noop elevator does front and back merging.
My understanding is that it's used in the frontend driver.
Otherwise, unplugging on every block would indeed be quite bad ...
Regards,
--
Kurt Garloff, Director SUSE Labs, Novell Inc.
[-- Attachment #1.2: Type: application/pgp-signature, Size: 189 bytes --]
[-- Attachment #2: Type: text/plain, Size: 138 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: poor domU VBD performance.
2005-03-31 15:34 ` Kurt Garloff
@ 2005-03-31 15:39 ` Jens Axboe
2005-03-31 15:41 ` Jens Axboe
` (2 more replies)
2005-03-31 16:53 ` Philip R Auld
1 sibling, 3 replies; 41+ messages in thread
From: Jens Axboe @ 2005-03-31 15:39 UTC (permalink / raw)
To: Kurt Garloff
Cc: Ian Pratt, Philip R Auld, Xen development list, Vincent Hanquez,
Christian Limpach
On Thu, Mar 31 2005, Kurt Garloff wrote:
> Hi,
>
> On Thu, Mar 31, 2005 at 09:33:12AM -0500, Philip R Auld wrote:
> > This effects merging though, right? I don't think the the front
> > end has done any merging.
>
> The noop elevator does front and back merging.
> My understanding is that it's used in the frontend driver.
>
> Otherwise, unplugging on every block would indeed be quite bad ...
Not necessarily - either your io rate is not fast enough to sustain a
substantial queue depth, in that case you get plugging on basically
every io anyways. If on the other hand the io rate is high enough to
maintain a queue depth of > 1, then the plugging will never take place
because the queue never empties.
So all in all, I don't think the temporary work-around will be such a
bad idea. I would still rather implement the queue tracking though, it
should not be more than a few lines of code.
And Philip, I will get the bio_sync() change merged :-)
--
Jens Axboe
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: poor domU VBD performance.
2005-03-31 15:39 ` Jens Axboe
@ 2005-03-31 15:41 ` Jens Axboe
2005-03-31 16:27 ` Nivedita Singhvi
2005-03-31 15:49 ` Keir Fraser
2005-03-31 16:55 ` Philip R Auld
2 siblings, 1 reply; 41+ messages in thread
From: Jens Axboe @ 2005-03-31 15:41 UTC (permalink / raw)
To: Kurt Garloff
Cc: Ian Pratt, Philip R Auld, Xen development list, Vincent Hanquez,
Christian Limpach
On Thu, Mar 31 2005, Jens Axboe wrote:
> On Thu, Mar 31 2005, Kurt Garloff wrote:
> > Hi,
> >
> > On Thu, Mar 31, 2005 at 09:33:12AM -0500, Philip R Auld wrote:
> > > This effects merging though, right? I don't think the the front
> > > end has done any merging.
> >
> > The noop elevator does front and back merging.
> > My understanding is that it's used in the frontend driver.
> >
> > Otherwise, unplugging on every block would indeed be quite bad ...
>
> Not necessarily - either your io rate is not fast enough to sustain a
> substantial queue depth, in that case you get plugging on basically
> every io anyways. If on the other hand the io rate is high enough to
> maintain a queue depth of > 1, then the plugging will never take place
> because the queue never empties.
>
> So all in all, I don't think the temporary work-around will be such a
> bad idea. I would still rather implement the queue tracking though, it
> should not be more than a few lines of code.
There are still cases where it will be suboptimal of course, I didn't
intend to claim it will always be as fast as queue tracking! If you are
unlucky enough that the first request will reach the target device and
get started before the next one, you will have a small and a large part
of any given request executed. This isn't good for performance,
naturally. But queueing is so fast, I would be surprised if this
happened much in the real world.
--
Jens Axboe
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: poor domU VBD performance.
2005-03-31 15:39 ` Jens Axboe
2005-03-31 15:41 ` Jens Axboe
@ 2005-03-31 15:49 ` Keir Fraser
2005-03-31 16:02 ` Andrew Theurer
2005-03-31 17:44 ` Jens Axboe
2005-03-31 16:55 ` Philip R Auld
2 siblings, 2 replies; 41+ messages in thread
From: Keir Fraser @ 2005-03-31 15:49 UTC (permalink / raw)
To: Jens Axboe
Cc: Ian Pratt, Xen development list, Kurt Garloff, Philip R Auld,
Vincent Hanquez, Christian Limpach
On 31 Mar 2005, at 16:39, Jens Axboe wrote:
> Not necessarily - either your io rate is not fast enough to sustain a
> substantial queue depth, in that case you get plugging on basically
> every io anyways. If on the other hand the io rate is high enough to
> maintain a queue depth of > 1, then the plugging will never take place
> because the queue never empties.
>
> So all in all, I don't think the temporary work-around will be such a
> bad idea. I would still rather implement the queue tracking though, it
> should not be more than a few lines of code.
I've checked in something along the lines of what you described into
both the 2.0-testing and the unstable trees. Looks to have identical
performance to the original simple patch, at least for a bulk 'dd'.
-- Keir
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: poor domU VBD performance.
2005-03-31 15:49 ` Keir Fraser
@ 2005-03-31 16:02 ` Andrew Theurer
2005-03-31 17:44 ` Jens Axboe
1 sibling, 0 replies; 41+ messages in thread
From: Andrew Theurer @ 2005-03-31 16:02 UTC (permalink / raw)
To: Keir Fraser
Cc: Ian Pratt, Philip R Auld, Kurt Garloff, Xen development list,
Vincent Hanquez, Jens Axboe, Christian Limpach
Keir Fraser wrote:
>
> On 31 Mar 2005, at 16:39, Jens Axboe wrote:
>
>> Not necessarily - either your io rate is not fast enough to sustain a
>> substantial queue depth, in that case you get plugging on basically
>> every io anyways. If on the other hand the io rate is high enough to
>> maintain a queue depth of > 1, then the plugging will never take place
>> because the queue never empties.
>>
>> So all in all, I don't think the temporary work-around will be such a
>> bad idea. I would still rather implement the queue tracking though, it
>> should not be more than a few lines of code.
>
>
> I've checked in something along the lines of what you described into
> both the 2.0-testing and the unstable trees. Looks to have identical
> performance to the original simple patch, at least for a bulk 'dd'.
I'll do a pull of unstable and see what I get with o_direct, thanks.
-Andrew
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: poor domU VBD performance.
2005-03-31 15:41 ` Jens Axboe
@ 2005-03-31 16:27 ` Nivedita Singhvi
2005-03-31 17:43 ` Jens Axboe
2005-03-31 18:27 ` Kurt Garloff
0 siblings, 2 replies; 41+ messages in thread
From: Nivedita Singhvi @ 2005-03-31 16:27 UTC (permalink / raw)
To: Jens Axboe
Cc: Ian Pratt, Xen development list, Kurt Garloff, Philip R Auld,
Vincent Hanquez, Christian Limpach
Jens Axboe wrote:
> There are still cases where it will be suboptimal of course, I didn't
> intend to claim it will always be as fast as queue tracking! If you are
> unlucky enough that the first request will reach the target device and
> get started before the next one, you will have a small and a large part
> of any given request executed. This isn't good for performance,
> naturally. But queueing is so fast, I would be surprised if this
> happened much in the real world.
Although the usual answer for what scheduling algorithm is
best is almost always "depends on the workload", it was
suggested to me that the cfq was still the best option to
go with. What do people feel about that? (Or is AS going
to remain default?).
Also, we're making the assumption here that guest OS = virtual
driver/device. I would rather we not make that assumption
always. This may be moot because I was also told there might
be a patch floating around (-mm ?) that allows you to
select scheduling algorithm on a per-device basis. Anyone
know if this is going to come in anytime soon?
thanks,
Nivedita
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: poor domU VBD performance.
2005-03-31 15:34 ` Kurt Garloff
2005-03-31 15:39 ` Jens Axboe
@ 2005-03-31 16:53 ` Philip R Auld
2005-03-31 18:01 ` Jens Axboe
1 sibling, 1 reply; 41+ messages in thread
From: Philip R Auld @ 2005-03-31 16:53 UTC (permalink / raw)
To: Kurt Garloff
Cc: Ian Pratt, Xen development list, Vincent Hanquez, Jens Axboe,
Christian Limpach
Rumor has it that on Thu, Mar 31, 2005 at 05:34:49PM +0200 Kurt Garloff said:
> Hi,
>
> On Thu, Mar 31, 2005 at 09:33:12AM -0500, Philip R Auld wrote:
> > This effects merging though, right? I don't think the the front
> > end has done any merging.
>
> The noop elevator does front and back merging.
> My understanding is that it's used in the frontend driver.
If that is the case, it can only merge things that are
machine contiguous. Current guests know this mapping, but
can they get this when running unmodified with VT-x.
My experience showed very little if any multipage
IO coming out of the front end.
>
> Otherwise, unplugging on every block would indeed be quite bad ...
Seems to be somewhat moot anyway given the curent change planned :)
Cheers,
Phil
>
> Regards,
> --
> Kurt Garloff, Director SUSE Labs, Novell Inc.
--
Philip R. Auld, Ph.D. Egenera, Inc.
Software Architect 165 Forest St.
(508) 858-2628 Marlboro, MA 01752
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: poor domU VBD performance.
2005-03-31 15:39 ` Jens Axboe
2005-03-31 15:41 ` Jens Axboe
2005-03-31 15:49 ` Keir Fraser
@ 2005-03-31 16:55 ` Philip R Auld
2 siblings, 0 replies; 41+ messages in thread
From: Philip R Auld @ 2005-03-31 16:55 UTC (permalink / raw)
To: Jens Axboe
Cc: Ian Pratt, Xen development list, Kurt Garloff, Vincent Hanquez,
Christian Limpach
Rumor has it that on Thu, Mar 31, 2005 at 05:39:26PM +0200 Jens Axboe said:
>
> And Philip, I will get the bio_sync() change merged :-)
Thanks! It's good to be transparent ;)
Phil
>
> --
> Jens Axboe
--
Philip R. Auld, Ph.D. Egenera, Inc.
Software Architect 165 Forest St.
(508) 858-2628 Marlboro, MA 01752
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: poor domU VBD performance.
2005-03-31 16:27 ` Nivedita Singhvi
@ 2005-03-31 17:43 ` Jens Axboe
2005-03-31 18:27 ` Kurt Garloff
1 sibling, 0 replies; 41+ messages in thread
From: Jens Axboe @ 2005-03-31 17:43 UTC (permalink / raw)
To: Nivedita Singhvi
Cc: Ian Pratt, Xen development list, Kurt Garloff, Philip R Auld,
Vincent Hanquez, Christian Limpach
On Thu, Mar 31 2005, Nivedita Singhvi wrote:
> Jens Axboe wrote:
>
> >There are still cases where it will be suboptimal of course, I didn't
> >intend to claim it will always be as fast as queue tracking! If you are
> >unlucky enough that the first request will reach the target device and
> >get started before the next one, you will have a small and a large part
> >of any given request executed. This isn't good for performance,
> >naturally. But queueing is so fast, I would be surprised if this
> >happened much in the real world.
>
> Although the usual answer for what scheduling algorithm is
> best is almost always "depends on the workload", it was
> suggested to me that the cfq was still the best option to
> go with. What do people feel about that? (Or is AS going
> to remain default?).
Really the only one that you should not use is AS, anything else will be
fine. AS should only ever be used at the bottom of the stack, if on a
single spindle backing. CFQ will be fine, as will deadline and noop.
> Also, we're making the assumption here that guest OS = virtual
> driver/device. I would rather we not make that assumption
> always. This may be moot because I was also told there might
> be a patch floating around (-mm ?) that allows you to
> select scheduling algorithm on a per-device basis. Anyone
> know if this is going to come in anytime soon?
That patch is in mainline since 2.6.10. You can change schedulers by
echoing the preferred scheduler to /sys/block/<device>/queue/scheduler -
reading that file will show you what schedulers are available.
--
Jens Axboe
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: poor domU VBD performance.
2005-03-31 15:49 ` Keir Fraser
2005-03-31 16:02 ` Andrew Theurer
@ 2005-03-31 17:44 ` Jens Axboe
1 sibling, 0 replies; 41+ messages in thread
From: Jens Axboe @ 2005-03-31 17:44 UTC (permalink / raw)
To: Keir Fraser
Cc: Ian Pratt, Xen development list, Kurt Garloff, Philip R Auld,
Vincent Hanquez, Christian Limpach
On Thu, Mar 31 2005, Keir Fraser wrote:
>
> On 31 Mar 2005, at 16:39, Jens Axboe wrote:
>
> >Not necessarily - either your io rate is not fast enough to sustain a
> >substantial queue depth, in that case you get plugging on basically
> >every io anyways. If on the other hand the io rate is high enough to
> >maintain a queue depth of > 1, then the plugging will never take place
> >because the queue never empties.
> >
> >So all in all, I don't think the temporary work-around will be such a
> >bad idea. I would still rather implement the queue tracking though, it
> >should not be more than a few lines of code.
>
> I've checked in something along the lines of what you described into
> both the 2.0-testing and the unstable trees. Looks to have identical
> performance to the original simple patch, at least for a bulk 'dd'.
Can you post the patch here for review? Or just point me somewhere I can
view it.
--
Jens Axboe
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: poor domU VBD performance.
2005-03-31 16:53 ` Philip R Auld
@ 2005-03-31 18:01 ` Jens Axboe
2005-03-31 18:43 ` Philip R Auld
0 siblings, 1 reply; 41+ messages in thread
From: Jens Axboe @ 2005-03-31 18:01 UTC (permalink / raw)
To: Philip R Auld
Cc: Ian Pratt, Xen development list, Kurt Garloff, Vincent Hanquez,
Christian Limpach
On Thu, Mar 31 2005, Philip R Auld wrote:
> > > This effects merging though, right? I don't think the the front
> > > end has done any merging.
> >
> > The noop elevator does front and back merging.
> > My understanding is that it's used in the frontend driver.
>
> If that is the case, it can only merge things that are
> machine contiguous. Current guests know this mapping, but
> can they get this when running unmodified with VT-x.
>
> My experience showed very little if any multipage
> IO coming out of the front end.
There aren't that many users of multipage ios yet. direct io will use
it, ext2 will as well. iirc, -mm has patches for ext3 too. so it's
definitely improving :-)
--
Jens Axboe
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: poor domU VBD performance.
2005-03-31 16:27 ` Nivedita Singhvi
2005-03-31 17:43 ` Jens Axboe
@ 2005-03-31 18:27 ` Kurt Garloff
2005-03-31 21:59 ` Nivedita Singhvi
1 sibling, 1 reply; 41+ messages in thread
From: Kurt Garloff @ 2005-03-31 18:27 UTC (permalink / raw)
To: Nivedita Singhvi
Cc: Ian Pratt, Philip R Auld, Kurt Garloff, Xen development list,
Vincent Hanquez, Jens Axboe, Christian Limpach
[-- Attachment #1.1: Type: text/plain, Size: 1339 bytes --]
Hi Niv,
On Thu, Mar 31, 2005 at 08:27:30AM -0800, Nivedita Singhvi wrote:
> Although the usual answer for what scheduling algorithm is
> best is almost always "depends on the workload", it was
> suggested to me that the cfq was still the best option to
> go with. What do people feel about that? (Or is AS going
> to remain default?).
This is a different dicussion.
But, yes, I would agree that CFQ (v3) is the best default choice.
Jens, should we maybe make sure that the blockback driver does use
different (fake) UIDs for the domains that it serves to provide
the fairness between them. Next step would be to allow to tweak
IO priorities. Or, to make it more general, add a parameter (call
it uid), that a block driver can pass down to the IO scheduler
and that would normally be current->uid but may be set differently?
> Also, we're making the assumption here that guest OS = virtual
> driver/device. I would rather we not make that assumption
> always. This may be moot because I was also told there might
> be a patch floating around (-mm ?) that allows you to
> select scheduling algorithm on a per-device basis. Anyone
It's part of 2.6.11.
garloff@tpkurt:~ [0]$ cat /sys/block/hda/queue/scheduler
noop anticipatory deadline [cfq]
Regards,
--
Kurt Garloff, Director SUSE Labs, Novell Inc.
[-- Attachment #1.2: Type: application/pgp-signature, Size: 189 bytes --]
[-- Attachment #2: Type: text/plain, Size: 138 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: poor domU VBD performance.
2005-03-31 18:01 ` Jens Axboe
@ 2005-03-31 18:43 ` Philip R Auld
2005-03-31 19:07 ` Keir Fraser
2005-03-31 19:21 ` Jens Axboe
0 siblings, 2 replies; 41+ messages in thread
From: Philip R Auld @ 2005-03-31 18:43 UTC (permalink / raw)
To: Jens Axboe
Cc: Ian Pratt, Xen development list, Kurt Garloff, Vincent Hanquez,
Christian Limpach
Rumor has it that on Thu, Mar 31, 2005 at 08:01:52PM +0200 Jens Axboe said:
> On Thu, Mar 31 2005, Philip R Auld wrote:
> >
> > My experience showed very little if any multipage
> > IO coming out of the front end.
>
> There aren't that many users of multipage ios yet. direct io will use
> it, ext2 will as well. iirc, -mm has patches for ext3 too. so it's
> definitely improving :-)
Sorry, I was being sloppy with terminology :)
What I was getting at was that the backend will split requests
up and issue each physical segment as a separate bio (at least in
the 2.0.5 tree I have in front of me). And that none of these
physical segments was more that 1 page.
So the request merging in the back end OS is important, no?
Cheers,
Phil
>
> --
> Jens Axboe
--
Philip R. Auld, Ph.D. Egenera, Inc.
Software Architect 165 Forest St.
(508) 858-2628 Marlboro, MA 01752
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: poor domU VBD performance.
2005-03-31 18:43 ` Philip R Auld
@ 2005-03-31 19:07 ` Keir Fraser
2005-03-31 19:10 ` Keir Fraser
2005-03-31 19:20 ` Jens Axboe
2005-03-31 19:21 ` Jens Axboe
1 sibling, 2 replies; 41+ messages in thread
From: Keir Fraser @ 2005-03-31 19:07 UTC (permalink / raw)
To: Philip R Auld
Cc: Ian Pratt, Xen development list, Kurt Garloff, Vincent Hanquez,
Jens Axboe, Christian Limpach
> What I was getting at was that the backend will split requests
> up and issue each physical segment as a separate bio (at least in
> the 2.0.5 tree I have in front of me). And that none of these
> physical segments was more that 1 page.
>
> So the request merging in the back end OS is important, no?
Ah, this reminds me I have one more question for Jens.
Since all the bio's that I queue up in a single invocation of
dispatch_rw_block_io() will actually be adjacent to each other (because
they're all from the same scatter-gather list) can I actually do
something like (very roughly):
bio = bio_alloc(GFP_KERNEL, nr_psegs);
for ( i = 0; i < nr_psegs; i++ )
bio_add_page(bio, blah...);
submit_bio(operation, bio);
Each of the biovecs that I queue may not be a full page in size (but
won't straddle a page boundary of course).
This would avoid the bio's having to be merged again later.
-- Keir
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: poor domU VBD performance.
2005-03-31 19:07 ` Keir Fraser
@ 2005-03-31 19:10 ` Keir Fraser
2005-03-31 19:20 ` Jens Axboe
1 sibling, 0 replies; 41+ messages in thread
From: Keir Fraser @ 2005-03-31 19:10 UTC (permalink / raw)
To: Keir Fraser
Cc: Ian Pratt, Xen development list, Kurt Garloff, Philip R Auld,
Vincent Hanquez, Jens Axboe, Christian Limpach
On 31 Mar 2005, at 20:07, Keir Fraser wrote:
> Since all the bio's that I queue up in a single invocation of
> dispatch_rw_block_io() will actually be adjacent to each other
> (because they're all from the same scatter-gather list)
I should add: I know that the code makes it look like each s-g element
might map somewhere entirely different from the previous one, but we no
longer support that mode of operation. Each VBD now always maps onto a
single, entire block device or partition.
-- Keir
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: poor domU VBD performance.
2005-03-31 19:07 ` Keir Fraser
2005-03-31 19:10 ` Keir Fraser
@ 2005-03-31 19:20 ` Jens Axboe
1 sibling, 0 replies; 41+ messages in thread
From: Jens Axboe @ 2005-03-31 19:20 UTC (permalink / raw)
To: Keir Fraser
Cc: Ian Pratt, Philip R Auld, Kurt Garloff, Xen development list,
Vincent Hanquez, Christian Limpach
On Thu, Mar 31 2005, Keir Fraser wrote:
> >What I was getting at was that the backend will split requests
> >up and issue each physical segment as a separate bio (at least in
> >the 2.0.5 tree I have in front of me). And that none of these
> >physical segments was more that 1 page.
> >
> >So the request merging in the back end OS is important, no?
>
> Ah, this reminds me I have one more question for Jens.
>
> Since all the bio's that I queue up in a single invocation of
> dispatch_rw_block_io() will actually be adjacent to each other (because
> they're all from the same scatter-gather list) can I actually do
> something like (very roughly):
>
> bio = bio_alloc(GFP_KERNEL, nr_psegs);
> for ( i = 0; i < nr_psegs; i++ )
> bio_add_page(bio, blah...);
> submit_bio(operation, bio);
>
> Each of the biovecs that I queue may not be a full page in size (but
> won't straddle a page boundary of course).
Yes, this is precisely what you should do, the current method is pretty
suboptimal. Basically allocate a bio with nr_psegs, and call
bio_add_page() for each page until it returns _less_ than the number of
bytes you requested. When it does that, submit that bio for io and
allocate a new bio with nr_psegs-submitted_segs bio_vecs attached.
Continue until you are done.
--
Jens Axboe
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: poor domU VBD performance.
2005-03-31 18:43 ` Philip R Auld
2005-03-31 19:07 ` Keir Fraser
@ 2005-03-31 19:21 ` Jens Axboe
1 sibling, 0 replies; 41+ messages in thread
From: Jens Axboe @ 2005-03-31 19:21 UTC (permalink / raw)
To: Philip R Auld
Cc: Ian Pratt, Xen development list, Kurt Garloff, Vincent Hanquez,
Christian Limpach
On Thu, Mar 31 2005, Philip R Auld wrote:
> Rumor has it that on Thu, Mar 31, 2005 at 08:01:52PM +0200 Jens Axboe said:
> > On Thu, Mar 31 2005, Philip R Auld wrote:
> > >
> > > My experience showed very little if any multipage
> > > IO coming out of the front end.
> >
> > There aren't that many users of multipage ios yet. direct io will use
> > it, ext2 will as well. iirc, -mm has patches for ext3 too. so it's
> > definitely improving :-)
>
> Sorry, I was being sloppy with terminology :)
>
> What I was getting at was that the backend will split requests
> up and issue each physical segment as a separate bio (at least in
> the 2.0.5 tree I have in front of me). And that none of these
> physical segments was more that 1 page.
>
> So the request merging in the back end OS is important, no?
I suppose it always is, since the merge criteria may have changed from
when the io was initially queued. If requests are always split into
single pages, then it becomes very important to merge at the backend.
--
Jens Axboe
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: poor domU VBD performance.
2005-03-31 18:27 ` Kurt Garloff
@ 2005-03-31 21:59 ` Nivedita Singhvi
0 siblings, 0 replies; 41+ messages in thread
From: Nivedita Singhvi @ 2005-03-31 21:59 UTC (permalink / raw)
To: Kurt Garloff
Cc: Ian Pratt, Xen development list, Philip R Auld, Vincent Hanquez,
Jens Axboe, Christian Limpach
Kurt Garloff wrote:
> Hi Niv,
>
> On Thu, Mar 31, 2005 at 08:27:30AM -0800, Nivedita Singhvi wrote:
>
>>Although the usual answer for what scheduling algorithm is
>>best is almost always "depends on the workload", it was
>>suggested to me that the cfq was still the best option to
>>go with. What do people feel about that? (Or is AS going
>>to remain default?).
>
>
> This is a different dicussion.
Yes, I did change the subject a little ;).
> But, yes, I would agree that CFQ (v3) is the best default choice.
Yep, even though some of the complications in the Xen
environment (as you point out below) will have to be addressed.
> Jens, should we maybe make sure that the blockback driver does use
> different (fake) UIDs for the domains that it serves to provide
> the fairness between them. Next step would be to allow to tweak
> IO priorities. Or, to make it more general, add a parameter (call
> it uid), that a block driver can pass down to the IO scheduler
> and that would normally be current->uid but may be set differently?
> It's part of 2.6.11.
> garloff@tpkurt:~ [0]$ cat /sys/block/hda/queue/scheduler
> noop anticipatory deadline [cfq]
I just saw Jens' reply as well. This is much goodness :).
Very handy indeed!
thanks,
Nivedita
^ permalink raw reply [flat|nested] 41+ messages in thread
* RE: Re: poor domU VBD performance.
@ 2005-04-01 17:46 Ian Pratt
0 siblings, 0 replies; 41+ messages in thread
From: Ian Pratt @ 2005-04-01 17:46 UTC (permalink / raw)
To: peter bier, xen-devel
> Now I have switched back to the filesystem operations. I do
> this by copying a "/usr" subtree from a slackware-10.0
> installation containg about 750 MB in 2200 directories and
> 37000 files. Copying these files with target directory on
> the same device as the source directory, I get between 90 and
> 93% of the per- formance in Dom0, when I work with DomU. When
> copying form a directory on one device into a directory of
> another device, performance in DomU leaks more behind that of
> Dom0. It's only 50 to 60 percent of the Dom0 performance. The
> performance is less than it is when using only one disk. I
> found out that the sum of the business of the two disks as
> reported by iostat on Dom0 is always slightly above 100%.
> Does this reflect that the reading and the writing both go
> through the VDB driver ? Both devices are never 100 % busy.
That latest 2.0-testing tree has some further blk queue plugging
enhancements along with a fix for another nasty performance bug. It
would be interesting to know whether that improves things.
It's possible that the blkring currently just isn't big enough if you're
trying to drive multiple devices with independent requests.
Ian
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Re: poor domU VBD performance.
2005-04-02 19:54 ` peter bier
@ 2005-04-03 15:27 ` Cédric Schieli
0 siblings, 0 replies; 41+ messages in thread
From: Cédric Schieli @ 2005-04-03 15:27 UTC (permalink / raw)
To: peter bier; +Cc: xen-devel
I can confim the problem only occur on SATA.
I've added an old IDE UDMA66 drive, created LVM volume from it and ran
same dd tests :
Dom0 : 12 MB/s
DomU : 12 MB/s
> It just sumbled accross the fact, that you are using a SATA disk, Cédric. This
> is
> exactly the "dd" behavior that my system containing SATA disks still shows. But
> it
> applies only to "dd" ( which, admittedly is read-only ). It does not apply to
> the performance figures I got when copying my "/usr" tree - as described in a
> previous post here - from one location of the disk to another location on the
> same disk ( which, of course is combined read-write on the same device ). Hence
> it might be possible that my limited performance copying from one disk to
> another might in fact be an effect of reduced read performance in DomU on a
> SATA disk.
>
> I suspect that this might be an effect specific to SATA disks. I will verify
> this on monday - when I have access to my computers in the office, by doing it
> on a system with two IDE disks. I will report it then, if your problem is still
> open.
>
> I will describe the exact configuration of the systems then (Motherboard, IO
> Controller, etc ).
^ permalink raw reply [flat|nested] 41+ messages in thread
* RE: Re: poor domU VBD performance.
@ 2005-04-03 16:38 Ian Pratt
2005-04-04 19:13 ` Cédric Schieli
0 siblings, 1 reply; 41+ messages in thread
From: Ian Pratt @ 2005-04-03 16:38 UTC (permalink / raw)
To: Cédric Schieli, peter bier; +Cc: xen-devel
> I can confim the problem only occur on SATA.
> I've added an old IDE UDMA66 drive, created LVM volume from
> it and ran same dd tests :
> Dom0 : 12 MB/s
> DomU : 12 MB/s
SATA works fine for me on 2.0-testing.
I get 50MB/s reading from a raw partition in both cases using:
time dd if=/dev/sda6 of=/dev/null bs=1024k count=1024
Can you try a raw partition rather than LVM?
Thanks,
Ian
^ permalink raw reply [flat|nested] 41+ messages in thread
* RE: Re: poor domU VBD performance.
2005-04-03 16:38 Ian Pratt
@ 2005-04-04 19:13 ` Cédric Schieli
0 siblings, 0 replies; 41+ messages in thread
From: Cédric Schieli @ 2005-04-04 19:13 UTC (permalink / raw)
To: Ian Pratt; +Cc: peter bier, xen-devel
> SATA works fine for me on 2.0-testing.
> I get 50MB/s reading from a raw partition in both cases using:
> time dd if=/dev/sda6 of=/dev/null bs=1024k count=1024
I've tried with a raw partition (the same that holds the LVM volume) and
got same results : 51 MB/s on Dom0 and 37 MB/s on DomU
I don't know if it is of importance, but I need to add
ignorebiostables=1 in my boot parameters in order to make the SATA work
(kernel hang on drive detection without it). The SATA controller is a
VIA one.
Cédric Schieli
^ permalink raw reply [flat|nested] 41+ messages in thread
* RE: Re: poor domU VBD performance.
@ 2005-04-04 19:36 Ian Pratt
2005-04-04 22:35 ` Nicholas Lee
0 siblings, 1 reply; 41+ messages in thread
From: Ian Pratt @ 2005-04-04 19:36 UTC (permalink / raw)
To: Cédric Schieli; +Cc: peter bier, xen-devel
> > SATA works fine for me on 2.0-testing.
> > I get 50MB/s reading from a raw partition in both cases using:
> > time dd if=/dev/sda6 of=/dev/null bs=1024k count=1024
>
> I've tried with a raw partition (the same that holds the LVM
> volume) and got same results : 51 MB/s on Dom0 and 37 MB/s on DomU
>
> I don't know if it is of importance, but I need to add
> ignorebiostables=1 in my boot parameters in order to make the
> SATA work (kernel hang on drive detection without it). The
> SATA controller is a VIA one.
It doesn't sound like Xen is too happy on your system, but its not clear how this would explain the performance difference between dom0 and domU.
When the IOAPIC patches are checked in it will be interesting to see whether this fixes it. Try the unstable tree in a week or so.
Best,
Ian
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Re: poor domU VBD performance.
2005-04-04 19:36 Ian Pratt
@ 2005-04-04 22:35 ` Nicholas Lee
0 siblings, 0 replies; 41+ messages in thread
From: Nicholas Lee @ 2005-04-04 22:35 UTC (permalink / raw)
To: Ian Pratt; +Cc: xen-devel
Some simple non-scientific additions to the performance numbers. IBM
x335/MPT SCSI.
Previously on 2.0.5/Testing on 2.6.10:
[nic@stateless:~/sys/xen] sudo hdparm -tT /dev/sda
/dev/sda:
Timing cached reads: 2884 MB in 2.00 seconds = 1442.00 MB/sec
Timing buffered disk reads: 100 MB in 3.05 seconds = 32.79 MB/sec
Not completely happy with the buffered read figure of 34Mb. I'm
putting together a new x205 server with Xen later today. I'll try do
some native vs Xen testing while I'm at it.
dom0:
[nic@stateless:~/tmp] time sudo cp db-svn.tgz db-svn-bak.tgz
real 0m13.058s
user 0m0.030s
sys 0m0.530s
domU:
[nic@base:/export/bak] time sudo cp db-svn.tgz db-svn-bak.tgz
real 0m23.574s
user 0m0.010s
sys 0m0.060s
[nic@stateless:~/tmp] ls -l db-svn.tgz
-rw-r--r-- 1 nic nic 188247603 2005-04-04 21:06 db-svn.tgz
With todays 2.0.6/Testing on 2.6.11.6:
[nic@stateless:~] sudo hdparm -tT /dev/sda
/dev/sda:
Timing cached reads: 2748 MB in 2.00 seconds = 1374.00 MB/sec
Timing buffered disk reads: 102 MB in 3.00 seconds = 34.00 MB/sec
[nic@stateless:~/tmp] time sudo cp db-svn.tgz db-svn-bak.tgz
real 0m10.468s
user 0m0.010s
sys 0m0.070s
[nic@base:/export/bak] time sudo cp db-svn.tgz db-svn-bak.tgz
real 0m11.243s
user 0m0.000s
sys 0m0.040s
Both filesystems based on XFS/LVM2. These numbers are based on one-run
right after boot, with in the domU case just one domU running.
So definite improvement.
Nicholas
^ permalink raw reply [flat|nested] 41+ messages in thread
* RE: Re: poor domU VBD performance.
@ 2005-04-12 10:51 Ian Pratt
0 siblings, 0 replies; 41+ messages in thread
From: Ian Pratt @ 2005-04-12 10:51 UTC (permalink / raw)
To: peter bier, xen-devel
#
> I am sorry to return to this issue after quite a long interruption.
> As I mentioned in a post before, I came accross this problem
> when I was testing file-system performance. After the
> problems with raw sequential I/O seemed to have been fixed in
> the testing release, I turned back to my original problem.
> I did a simple test that dispite its simplicity seems to put
> the IO subsystem under considerable stress. I took the /usr
> tree of my system and copied five it times into different
> directories on a slice of disk 1. This tree con- sistst of
> 36000 files with about 750 MB of data. Then I started to copy
> each of these copies recursively onto disk 2 ( each to its
> own location on that disk, of course ). I ran these copying
> in parallel and the processes took about 6 to 7 minutes in
> DOM0, while they needed between 14.6 and 15.9 minutes in DOMU.
>
> Essentially, this means that using this heavy io load on the
> system I get back to my 40% ratio between io performance on
> DOMU compared and io perfor- mance on DOM0 that I initially
> reported. This may just be coincidence, but probably it is
> worth mention.
It's possible that the dom0 doing prefetch as well as the domU is
messing up random IO performance. Do the iostat numbers suggest dom0 is
reading more data overall when doing it on behalf of a domU?
We'll need a simpler way of reproducing this if any headway is to be
made debugging it.
It might be worth writing a program to do psuedo-random IO reads to a
partition, both in DIRECT and normal mode, then run it in dom0 and domU.
[Chris: you have such a program already, right? Can you post it, thanks]
Ian
^ permalink raw reply [flat|nested] 41+ messages in thread
end of thread, other threads:[~2005-04-12 10:51 UTC | newest]
Thread overview: 41+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-03-30 11:16 RE: RE: poor domU VBD performance Ian Pratt
2005-03-30 17:01 ` peter bier
2005-03-30 18:05 ` Andrew Theurer
2005-03-31 7:05 ` RE: " Jens Axboe
2005-03-31 7:10 ` Jens Axboe
2005-03-31 8:17 ` Keir Fraser
2005-03-31 8:19 ` Jens Axboe
2005-03-31 14:33 ` Philip R Auld
2005-03-31 15:34 ` Kurt Garloff
2005-03-31 15:39 ` Jens Axboe
2005-03-31 15:41 ` Jens Axboe
2005-03-31 16:27 ` Nivedita Singhvi
2005-03-31 17:43 ` Jens Axboe
2005-03-31 18:27 ` Kurt Garloff
2005-03-31 21:59 ` Nivedita Singhvi
2005-03-31 15:49 ` Keir Fraser
2005-03-31 16:02 ` Andrew Theurer
2005-03-31 17:44 ` Jens Axboe
2005-03-31 16:55 ` Philip R Auld
2005-03-31 16:53 ` Philip R Auld
2005-03-31 18:01 ` Jens Axboe
2005-03-31 18:43 ` Philip R Auld
2005-03-31 19:07 ` Keir Fraser
2005-03-31 19:10 ` Keir Fraser
2005-03-31 19:20 ` Jens Axboe
2005-03-31 19:21 ` Jens Axboe
-- strict thread matches above, loose matches on Subject: below --
2005-04-12 10:51 Ian Pratt
2005-04-04 19:36 Ian Pratt
2005-04-04 22:35 ` Nicholas Lee
2005-04-03 16:38 Ian Pratt
2005-04-04 19:13 ` Cédric Schieli
2005-04-01 23:22 Ian Pratt
2005-04-02 10:36 ` Cédric Schieli
2005-04-02 19:54 ` peter bier
2005-04-03 15:27 ` Cédric Schieli
2005-04-01 17:46 Ian Pratt
2005-03-29 14:19 Ian Pratt
2005-03-29 13:38 Ian Pratt
2005-03-29 14:28 ` B.G. Bruce
2005-03-29 8:13 Ian Pratt
2005-03-29 18:39 ` Andrew Theurer
2005-03-29 19:13 ` Steven Hand
2005-03-28 20:14 Ian Pratt
2005-03-28 21:48 ` Andrew Theurer
2005-03-28 23:38 ` Peter Bier
2005-03-29 0:27 ` Andrew Theurer
2005-03-29 13:34 ` Kurt Garloff
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.