poor domU VBD performance.

All of lore.kernel.org
 help / color / mirror / Atom feed

* poor domU VBD performance.
@ 2005-03-26 18:14 Peter Bier
  0 siblings, 0 replies; 60+ messages in thread
From: Peter Bier @ 2005-03-26 18:14 UTC (permalink / raw)
  To: xen-devel

I have installed XEN and linux 2.6.10 on three different machines. The slowest 
of them was my computer at home running and Athlon XP 1600+ ( 1.4 GHZ ) and 256 
MB RAM. 

My Problem is reduced file-system performance in domU guests. These guest run
faster when I use loopbacked files on Dom0 than the do when I use real partitions
and poulate them with a linux system. 

I found out that dom0 does file-system IO and raw IO ( using dd as a tool to test
throughput from the disk ) is about exactly the same as when using a standard 
linux kernel without XEN. But the raw IO from DomU to an unused disk ( a second
disk in the system ) is limited to fourty percent of the speed I get within Dom0.
This effect transforms to about the same ratio when doing real file-system IO.

I found this sympthom in all of the systems I installed. An early paper about 
XEN describes that the penalty when using VDBs is close to zero and neglectable.
I think this conflicts with the results I got and I believe this reflects that 
something in my configuration is wrong ( at least I hope so ). 

I have the drivers for my chipset linked into the kernel and hdparm tells me that
DMA is enabled for the used disks ( using hdparm under Dom0 ). 

What worries me is that the results within Dom0 are completely satisfactory, 
while those in DomU are not. Do I have to change the kernel config for DomU ? Or
is there any special option I have to set in the kernel configuration for Dom0 or
even for xen?

I have compiled version 2.0.5 - the newest available, to my knowledge.

Any hints ??  

-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click

^ permalink raw reply	[flat|nested] 60+ messages in thread

* RE: poor domU VBD performance.
@ 2005-03-27 17:41 Ian Pratt
  2005-03-28  8:48 ` peter bier
                   ` (2 more replies)
  0 siblings, 3 replies; 60+ messages in thread
From: Ian Pratt @ 2005-03-27 17:41 UTC (permalink / raw)
  To: Peter Bier, xen-devel; +Cc: ian.pratt

> I found out that dom0 does file-system IO and raw IO ( using 
> dd as a tool to test
> throughput from the disk ) is about exactly the same as when 
> using a standard 
> linux kernel without XEN. But the raw IO from DomU to an 
> unused disk ( a second
> disk in the system ) is limited to fourty percent of the 
> speed I get within Dom0.

Just to be clear: you're doing a dd performance test within dom0 to the
exact same partition on the 2nd disk that you're using when you start
the domU and finding that the domU 'dd' performance is 40% of the dom0
performance?

I've not heard of anyone else having problems like this. What happens if
you use a partition on the 1st disk?

What chipset is the IDE controller? What device (e.g. sda1) are you
exporting the disk partition into the domU as?

Are you sure dom 0 is idle when doing the dd test in the domU?

Ian

-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_ide95&alloc_id\x14396&op=click

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: poor domU VBD performance.
  2005-03-27 17:41 Ian Pratt
@ 2005-03-28  8:48 ` peter bier
  2005-03-28 12:44 ` peter bier
  2005-03-29  6:20 ` Pasi Kärkkäinen
  2 siblings, 0 replies; 60+ messages in thread
From: peter bier @ 2005-03-28  8:48 UTC (permalink / raw)
  To: xen-devel

Ian Pratt <m+Ian.Pratt <at> cl.cam.ac.uk> writes:

> 
> > I found out that dom0 does file-system IO and raw IO ( using 
> > dd as a tool to test
> > throughput from the disk ) is about exactly the same as when 
> > using a standard 
> > linux kernel without XEN. But the raw IO from DomU to an 
> > unused disk ( a second
> > disk in the system ) is limited to fourty percent of the 
> > speed I get within Dom0.
> 
> Just to be clear: you're doing a dd performance test within dom0 to the
> exact same partition on the 2nd disk that you're using when you start
> the domU and finding that the domU 'dd' performance is 40% of the dom0
> performance?
> 
> I've not heard of anyone else having problems like this. What happens if
> you use a partition on the 1st disk?
> 
> What chipset is the IDE controller? What device (e.g. sda1) are you
> exporting the disk partition into the domU as?
> 
> Are you sure dom 0 is idle when doing the dd test in the domU?
> 
> Ian
> 
> -------------------------------------------------------
> SF email is sponsored by - The IT Product Guide
> Read honest & candid reviews on hundreds of IT Products from real users.
> Discover which products truly live up to the hype. Start reading now.
> http://ads.osdn.com/?ad_ide95&alloc_id\x14396&op=click
> 

Yes, I have tried various partitions both from Dom0 and DomU on both disks and
the result has always been a performance ratio of 2.5 between Dom0 and DomU. 
Yes I used dd for the test. But I came accross this problem doing IO into 
the filesystem. I was surprised that I did not only get no improvement when
switching for a loopbacked file as "device" for DomU to a real device but that
I got a performance degradation. 
With that effect I started to test raw io performance using dd.

I am sure that the device was not busy and dom0 was idle when I did the test. 
There where no busy jobs in dom0 neither CPU- nor IO-bound. I don't know
which chipset the ide-controller is. My mainbord is a MSI KT7 board. I am
currently not at home, must lookup what the ide-controller is. 

The devices I exported to have been hda1 and hdb6 on my computer at home and
hdg5 in the office. In the latter case the disk is attached to a Promise202
raid controller. 

Is there any description what I have to do to configure my system adequately 
to run efficiently using Xen ? If such where available I might be able to 
locate the problem myself.

I have not yet done a "dd performance" test using loopbacked files as devices
yet. I only used them as filesystems. 

Thanks in advance 
          Peter Bier 

-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: poor domU VBD performance.
  2005-03-27 17:41 Ian Pratt
  2005-03-28  8:48 ` peter bier
@ 2005-03-28 12:44 ` peter bier
  2005-03-29  6:20 ` Pasi Kärkkäinen
  2 siblings, 0 replies; 60+ messages in thread
From: peter bier @ 2005-03-28 12:44 UTC (permalink / raw)
  To: xen-devel

Ian Pratt <m+Ian.Pratt <at> cl.cam.ac.uk> writes:

> 
> > I found out that dom0 does file-system IO and raw IO ( using 
> > dd as a tool to test
> > throughput from the disk ) is about exactly the same as when 
> > using a standard 
> > linux kernel without XEN. But the raw IO from DomU to an 
> > unused disk ( a second
> > disk in the system ) is limited to fourty percent of the 
> > speed I get within Dom0.
> 
> Just to be clear: you're doing a dd performance test within dom0 to the
> exact same partition on the 2nd disk that you're using when you start
> the domU and finding that the domU 'dd' performance is 40% of the dom0
> performance?
> 
> I've not heard of anyone else having problems like this. What happens if
> you use a partition on the 1st disk?
> 
> What chipset is the IDE controller? What device (e.g. sda1) are you
> exporting the disk partition into the domU as?
> 
> Are you sure dom 0 is idle when doing the dd test in the domU?
> 
> Ian
> 
> -------------------------------------------------------
> SF email is sponsored by - The IT Product Guide
> Read honest & candid reviews on hundreds of IT Products from real users.
> Discover which products truly live up to the hype. Start reading now.
> http://ads.osdn.com/?ad_ide95&alloc_id\x14396&op=click
> 

Yes I do the performance testing using dd. It's only a simple "benchmark" but
its results seem to indicate
a fundamental issue. I did the tests with the same partitions from DOM0 just as
DomU. I used both disks
and Dom0 achieved in all experiments 2.5 times the transfer rate of DomU.

I do not know the chipset of my IDE controller on my computer at home, while I
know that in the office it
pas a Promise raid controller ( I am neither at home nor in the office
momentarily ) . I am sure, that the
system was idle during all test ( meaning that there was only the standard
system running with no busy
jobs and no user program consuming CPU or IO resources.

I am very interested about Xen, but I need to fiy that problem.
If there is any "checklist" how to configure XEN efficiently, I might be able to
fix the problem myself

Thanks

Peter Bier 

^ permalink raw reply	[flat|nested] 60+ messages in thread

* RE: poor domU VBD performance.
@ 2005-03-28 18:55 Ian Pratt
  2005-03-28 19:33 ` Andrew Theurer
  0 siblings, 1 reply; 60+ messages in thread
From: Ian Pratt @ 2005-03-28 18:55 UTC (permalink / raw)
  To: Ian Pratt, Peter Bier, xen-devel


> > I found out that dom0 does file-system IO and raw IO ( using 
> > dd as a tool to test
> > throughput from the disk ) is about exactly the same as when 
> > using a standard 
> > linux kernel without XEN. But the raw IO from DomU to an 
> > unused disk ( a second
> > disk in the system ) is limited to fourty percent of the 
> > speed I get within Dom0.

OK, this looks like a perofrmance bug that's crept into the 2.6 dom0
some where along the way. I'm surprised no-one else has spotted it. 

Please can you confirm that performance is OK if you use 2.4 as a dom0?
(It doesn't matter what you use as guests).


Thanks,
Ian

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: poor domU VBD performance.
  2005-03-28 18:55 Ian Pratt
@ 2005-03-28 19:33 ` Andrew Theurer
  0 siblings, 0 replies; 60+ messages in thread
From: Andrew Theurer @ 2005-03-28 19:33 UTC (permalink / raw)
  To: Ian Pratt, Peter Bier, xen-devel

On Monday 28 March 2005 12:55, Ian Pratt wrote:
> > > I found out that dom0 does file-system IO and raw IO ( using
> > > dd as a tool to test
> > > throughput from the disk ) is about exactly the same as when
> > > using a standard
> > > linux kernel without XEN. But the raw IO from DomU to an
> > > unused disk ( a second
> > > disk in the system ) is limited to fourty percent of the
> > > speed I get within Dom0.

Is the second disk exactly the same as the first one?  I'll try an IO test 
here on the same disk array with dom0 and domU and see what I get.

-Andrew
>
> OK, this looks like a perofrmance bug that's crept into the 2.6 dom0
> some where along the way. I'm surprised no-one else has spotted it.
>
> Please can you confirm that performance is OK if you use 2.4 as a dom0?
> (It doesn't matter what you use as guests).
>
>
> Thanks,
> Ian
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 60+ messages in thread

* RE: poor domU VBD performance.
@ 2005-03-28 20:14 Ian Pratt
  2005-03-28 20:18 ` Andrew Theurer
  2005-03-28 21:48 ` Andrew Theurer
  0 siblings, 2 replies; 60+ messages in thread
From: Ian Pratt @ 2005-03-28 20:14 UTC (permalink / raw)
  To: Andrew Theurer, Peter Bier, xen-devel

> > > > I found out that dom0 does file-system IO and raw IO ( using
> > > > dd as a tool to test
> > > > throughput from the disk ) is about exactly the same as when
> > > > using a standard
> > > > linux kernel without XEN. But the raw IO from DomU to an
> > > > unused disk ( a second
> > > > disk in the system ) is limited to fourty percent of the
> > > > speed I get within Dom0.
> 
> Is the second disk exactly the same as the first one?  I'll 
> try an IO test 
> here on the same disk array with dom0 and domU and see what I get.

I've reproduced the problem and its a real issue. 

It only affects reads, and is almost certainly down to how the blkback
driver passes requests down to the actual device.

Does anyone on the list actually understand the changes made to linux
block IO between 2.4 and 2.6?

In the 2.6 blkfront there is no run_task_queue() to flush requests to
the lower layer, and we use submit_bio() instead of 2.4's
generic_make_request(). It looks like this is happening syncronously
rather than queueing multiple requests. What should we be doing to cause
things to be batched?

Thanks,
Ian

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: poor domU VBD performance.
  2005-03-28 20:14 Ian Pratt
@ 2005-03-28 20:18 ` Andrew Theurer
  2005-03-28 21:48 ` Andrew Theurer
  1 sibling, 0 replies; 60+ messages in thread
From: Andrew Theurer @ 2005-03-28 20:18 UTC (permalink / raw)
  To: Ian Pratt, Peter Bier, xen-devel

On Monday 28 March 2005 14:14, Ian Pratt wrote:
> > > > > I found out that dom0 does file-system IO and raw IO ( using
> > > > > dd as a tool to test
> > > > > throughput from the disk ) is about exactly the same as when
> > > > > using a standard
> > > > > linux kernel without XEN. But the raw IO from DomU to an
> > > > > unused disk ( a second
> > > > > disk in the system ) is limited to fourty percent of the
> > > > > speed I get within Dom0.
> >
> > Is the second disk exactly the same as the first one?  I'll
> > try an IO test
> > here on the same disk array with dom0 and domU and see what I get.
>
> I've reproduced the problem and its a real issue.
>
> It only affects reads, and is almost certainly down to how the blkback
> driver passes requests down to the actual device.
>
> Does anyone on the list actually understand the changes made to linux
> block IO between 2.4 and 2.6?
>
> In the 2.6 blkfront there is no run_task_queue() to flush requests to
> the lower layer, and we use submit_bio() instead of 2.4's
> generic_make_request(). It looks like this is happening syncronously
> rather than queueing multiple requests. What should we be doing to cause
> things to be batched?

There are multiple IO schedulers in 2.6.  Do you know which one is being used?  
It should say somewhere in the boot log.  Some read-ahead code also changed 
in 2.6.10-11 range.

So far I have not been able to reproduce this in xen-unstable with 2.6.  I am 
building xen-2.0.5 for a look.

-Andrew

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: poor domU VBD performance.
  2005-03-28 20:14 Ian Pratt
  2005-03-28 20:18 ` Andrew Theurer
@ 2005-03-28 21:48 ` Andrew Theurer
  2005-03-28 23:38   ` Peter Bier
  1 sibling, 1 reply; 60+ messages in thread
From: Andrew Theurer @ 2005-03-28 21:48 UTC (permalink / raw)
  To: Ian Pratt, Peter Bier, xen-devel

On Monday 28 March 2005 14:14, Ian Pratt wrote:
> > > > > I found out that dom0 does file-system IO and raw IO ( using
> > > > > dd as a tool to test
> > > > > throughput from the disk ) is about exactly the same as when
> > > > > using a standard
> > > > > linux kernel without XEN. But the raw IO from DomU to an
> > > > > unused disk ( a second
> > > > > disk in the system ) is limited to fourty percent of the
> > > > > speed I get within Dom0.
> >
> > Is the second disk exactly the same as the first one?  I'll
> > try an IO test
> > here on the same disk array with dom0 and domU and see what I get.
>
> I've reproduced the problem and its a real issue.
> It only affects reads, and is almost certainly down to how the blkback
> driver passes requests down to the actual device.
>
> Does anyone on the list actually understand the changes made to linux
> block IO between 2.4 and 2.6?
>
> In the 2.6 blkfront there is no run_task_queue() to flush requests to
> the lower layer, and we use submit_bio() instead of 2.4's
> generic_make_request(). It looks like this is happening syncronously
> rather than queueing multiple requests. What should we be doing to cause
> things to be batched?

To my knowlege you cannot queue multiple bio requests at once.  The IO 
schedulers should batch them up before submitting to the actual devices.  I 
tried xen-2.0.5 and xen-unstable with a sequential read test using 256k 
request size and 8 reader threads with o_direct on a lvm-raid-0 scsci array 
(no HW cache) and got:

xen-2-dom0-2.6.10:  177 MB/sec
xen-2-domU-2.6.10:  185 MB/sec
xen-3-dom0-2.6.11:  177 MB/sec
xen-3-domU-2.6.11:  185 MB/sec

Better results with VBD :)  I am wondering if going through 2 layers of IO 
schedulers streams the IO better.  I was using AS scheduler.  I am going to 
try noop scheduler and see what i get.

What block size were you using with dd?

-Andrew

^ permalink raw reply	[flat|nested] 60+ messages in thread

* RE: poor domU VBD performance.
@ 2005-03-28 22:17 Ian Pratt
  2005-03-29  8:44 ` peter bier
  0 siblings, 1 reply; 60+ messages in thread
From: Ian Pratt @ 2005-03-28 22:17 UTC (permalink / raw)
  To: Andrew Theurer, Peter Bier, xen-devel

> tried xen-2.0.5 and xen-unstable with a sequential read test 
> using 256k 
> request size and 8 reader threads with o_direct on a 
> lvm-raid-0 scsci array 
> (no HW cache) and got:
> 
> xen-2-dom0-2.6.10:  177 MB/sec
> xen-2-domU-2.6.10:  185 MB/sec
> xen-3-dom0-2.6.11:  177 MB/sec
> xen-3-domU-2.6.11:  185 MB/sec

Please can you try a simple 'dd if=/dev/sdaXX of=/dev/null bs=1024k
count=4096'
to read 4GB from the partition both in dom0 and domU.

When booting, I get the following output, which I presume is the
default?
 elevator: using anticipatory as default io scheduler

Thanks,
Ian

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: poor domU VBD performance.
  2005-03-28 21:48 ` Andrew Theurer
@ 2005-03-28 23:38   ` Peter Bier
  2005-03-29  0:27     ` Andrew Theurer
  0 siblings, 1 reply; 60+ messages in thread
From: Peter Bier @ 2005-03-28 23:38 UTC (permalink / raw)
  To: xen-devel

Andrew Theurer <habanero <at> us.ibm.com> writes:

> 
> On Monday 28 March 2005 14:14, Ian Pratt wrote:
> > > > > > I found out that dom0 does file-system IO and raw IO ( using
> > > > > > dd as a tool to test
> > > > > > throughput from the disk ) is about exactly the same as when
> > > > > > using a standard
> > > > > > linux kernel without XEN. But the raw IO from DomU to an
> > > > > > unused disk ( a second
> > > > > > disk in the system ) is limited to fourty percent of the
> > > > > > speed I get within Dom0.
> > >
> > > Is the second disk exactly the same as the first one?  I'll
> > > try an IO test
> > > here on the same disk array with dom0 and domU and see what I get.
> >
> > I've reproduced the problem and its a real issue.
> > It only affects reads, and is almost certainly down to how the blkback
> > driver passes requests down to the actual device.
> >
> > Does anyone on the list actually understand the changes made to linux
> > block IO between 2.4 and 2.6?
> >
> > In the 2.6 blkfront there is no run_task_queue() to flush requests to
> > the lower layer, and we use submit_bio() instead of 2.4's
> > generic_make_request(). It looks like this is happening syncronously
> > rather than queueing multiple requests. What should we be doing to cause
> > things to be batched?
> 
> To my knowlege you cannot queue multiple bio requests at once.  The IO 
> schedulers should batch them up before submitting to the actual devices.  I 
> tried xen-2.0.5 and xen-unstable with a sequential read test using 256k 
> request size and 8 reader threads with o_direct on a lvm-raid-0 scsci array 
> (no HW cache) and got:
> 
> xen-2-dom0-2.6.10:  177 MB/sec
> xen-2-domU-2.6.10:  185 MB/sec
> xen-3-dom0-2.6.11:  177 MB/sec
> xen-3-domU-2.6.11:  185 MB/sec
> 
> Better results with VBD :)  I am wondering if going through 2 layers of IO 
> schedulers streams the IO better.  I was using AS scheduler.  I am going to 
> try noop scheduler and see what i get.
> 
> What block size were you using with dd?
> 
> -Andrew
> 


My dd command was always the same: "dd if=/dev/hdb6 bs=64k count=1000" and it 
took 1.6 seconds on hdb6 and 2.2 seconds on hda1 when running in Dom0 and it
took 4.6 seconds on hdb6 and 5.8 seconds on hda1 when running on DomU. I did
one experiment with count=10000 and it took ten times as long in each of the
four cases.

I have done the following tests:
DomU : dd if=/dev/hdb6 of=/dev/null bs=1024k count=4000 ; duration 301 sec
DomU : dd if=/dev/hdb6 of=/dev/null bs=1024k count=4000 ; duration 370 sec

Dom0 : dd if=/dev/hdb6 of=/dev/null bs=1024k count=4000 ; duration 115 sec
Dom0 : dd if=/dev/hda1 of=/dev/null bs=1024k count=4000 ; duration 140 sec 

Peter

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: poor domU VBD performance.
  2005-03-27 17:41 Ian Pratt
  2005-03-28  8:48 ` peter bier
  2005-03-28 12:44 ` peter bier
@ 2005-03-29  6:20 ` Pasi Kärkkäinen
  2 siblings, 0 replies; 60+ messages in thread
From: Pasi Kärkkäinen @ 2005-03-29  6:20 UTC (permalink / raw)
  To: Ian Pratt; +Cc: Peter Bier, xen-devel, ian.pratt

On Sun, Mar 27, 2005 at 06:41:27PM +0100, Ian Pratt wrote:
> > I found out that dom0 does file-system IO and raw IO ( using 
> > dd as a tool to test
> > throughput from the disk ) is about exactly the same as when 
> > using a standard 
> > linux kernel without XEN. But the raw IO from DomU to an 
> > unused disk ( a second
> > disk in the system ) is limited to fourty percent of the 
> > speed I get within Dom0.
> 
> Just to be clear: you're doing a dd performance test within dom0 to the
> exact same partition on the 2nd disk that you're using when you start
> the domU and finding that the domU 'dd' performance is 40% of the dom0
> performance?
> 
> I've not heard of anyone else having problems like this. What happens if
> you use a partition on the 1st disk?
> 
> What chipset is the IDE controller? What device (e.g. sda1) are you
> exporting the disk partition into the domU as?
> 
> Are you sure dom 0 is idle when doing the dd test in the domU?
> 

I reported same kind of problems earlier too. 

2.4 domU is really slow (1/3 speed of 2.6 dom0), 2.6 domU is faster, but not even close 
to the speed of 2.6 dom0.

My tests were on top lvm over sw-raid5.

-- Pasi Kärkkäinen
       
                                   ^
                                .     .
                                 Linux
                              /    -    \
                             Choice.of.the
                           .Next.Generation.


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_ide95&alloc_id\x14396&op=click

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: poor domU VBD performance.
  2005-03-28 22:17 Ian Pratt
@ 2005-03-29  8:44 ` peter bier
  0 siblings, 0 replies; 60+ messages in thread
From: peter bier @ 2005-03-29  8:44 UTC (permalink / raw)
  To: xen-devel

Ian Pratt <m+Ian.Pratt <at> cl.cam.ac.uk> writes:


>  elevator: using anticipatory as default io scheduler
> 
> Thanks,
> Ian
> 
Yes, the output is 
   elevator: using anticipatory as default io scheduler


Peter

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: poor domU VBD performance.
  2005-03-29  0:27     ` Andrew Theurer
@ 2005-03-29 11:39       ` peter bier
  0 siblings, 0 replies; 60+ messages in thread
From: peter bier @ 2005-03-29 11:39 UTC (permalink / raw)
  To: xen-devel

Andrew Theurer <habanero <at> us.ibm.com> writes:

> 
> > My dd command was always the same: "dd if=/dev/hdb6 bs=64k count=1000" and
> > it took 1.6 seconds on hdb6 and 2.2 seconds on hda1 when running in Dom0
> > and it took 4.6 seconds on hdb6 and 5.8 seconds on hda1 when running on
> > DomU. I did one experiment with count=10000 and it took ten times as long
> > in each of the four cases.
> >
> > I have done the following tests:
> > DomU : dd if=/dev/hdb6 of=/dev/null bs=1024k count=4000 ; duration 301 sec
> > DomU : dd if=/dev/hdb6 of=/dev/null bs=1024k count=4000 ; duration 370 sec
> >
> > Dom0 : dd if=/dev/hdb6 of=/dev/null bs=1024k count=4000 ; duration 115 sec
> > Dom0 : dd if=/dev/hda1 of=/dev/null bs=1024k count=4000 ; duration 140 sec
> 
> OK, I have produced this with both dd and o-direct now.  On o-direct, I 
needed 
> what was the effective dd block request size (128k) and I got similar 
> results.  My results are much worse, due to that I am driving 14 disks:
> 
> dom0:	153.5 MB/sec
> domU:	 12.7 MB/sec
> 
> It looks like there might be a problem were we are not getting a timely 
> response back from dom0 VBD driver that the io request is complete, which 
> limits the number of outstanding requests to a level which cannot keep the 
> disk utilized well.  If you drive enough IO outstanding requests (which can 
> be done with either o-direct with large request or a much larger readahead 
> setting with buffered IO), it's not an issue. 
> 
> In the domU, can you try setting the readahead size to a much larger value 
> using hdparm? Something like hdparm -a 2028, then run dd?
> 
> -Andrew
> 

It's tuesday now, and I am working in the office using my two machines with 
the Promise controller. The two differ in that one is using ide disks, while 
the other, the newer one has sata disks. I have restricted myself to the 
elder computer. 

It has one disk, a Maxtor 6Y120L0, 120 G with a 2048 KB Cache. On that machine 
the disk is hde and the exported slice is hde1. The slice is not in use and I 
am running the os from a loop-backed file as rootfs. I have done a 

"dd if=/dev/hde1 of=/dev/null bs=1024k count=1024"

in domU. 

hdparm told that the default setup was 256k readahead.

I have tested the performance with the following readahead settings:

readahead    |     duration 
128 sectors  |     160 sec
256 sectors  |      76 sec
512 sectors  |      18.5 sec
1024 sectors |      19.5 sec
2048 sectors |     786 sec
1536 sectors |     775 sec
1200 sectors |     457 sec
1000 sectors |     20 sec
800 sectors  |     18.5 sec
600 sectors  |     18.5 sec  

dom0 takes 18.0 secs no matter of the readahead setting in Dom0 is.

Peter 

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: poor domU VBD performance.
  2005-03-29 14:19 Ian Pratt
@ 2005-03-29 15:27 ` peter bier
  0 siblings, 0 replies; 60+ messages in thread
From: peter bier @ 2005-03-29 15:27 UTC (permalink / raw)
  To: xen-devel

Ian Pratt <m+Ian.Pratt <at> cl.cam.ac.uk> writes:
> 
> > Would you mind repeating these experiments with a 2.4 dom0 
> > and a 2.6domU
> > ?
> 
> Also, please could you try exporting the device to the dom0 as a scsi
> device e.g. sda1 rather than ide device hde1 or hda1. [Yes, I know this
> shouldn't make any difference, but I have a suspicion it will.]
> 
> Thanks,
> Ian
> 

Ian, 

I will do the tests you asked for. but today is my wife's birthday, and I 
am already at home so I have no access to me test computers there. 

I have done some testing with the second, newer host with SATA Disks. The change
of the readahead quantity showed no effect on the reduced throughput. I do not
remember the exact ratio, but I think it was quite similar than with the ide 
disks and readahead of 256 sectors.

I will report on that tomorrow in a more detailed fashion. 

And I will do the tests with linux 2.4 as domU 

Peter 

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: poor domU VBD performance.
  2005-03-29 22:45 ` RE: " Kurt Garloff
@ 2005-03-29 22:59   ` Andrew Theurer
  2005-03-29 23:19     ` Kurt Garloff
  0 siblings, 1 reply; 60+ messages in thread
From: Andrew Theurer @ 2005-03-29 22:59 UTC (permalink / raw)
  To: Kurt Garloff, Ian Pratt
  Cc: Vincent Hanquez, Xen development list, Jens Axboe,
	Christian Limpach

On Tuesday 29 March 2005 16:45, Kurt Garloff wrote:
> Hi Ian,
>
> On Tue, Mar 29, 2005 at 07:09:50PM +0100, Ian Pratt wrote:
> > We'd really appreciate your help on this, or from someone else at SuSE
> > who actually understands the Linux block layer?
>
> I'm Cc'ing Jens ...
>
> > In the 2.6 blkfront driver, what scheduler should we be registering
> > with? What should we be setting as max_sectors? Are there other
> > parameters we should be setting that we aren't? (block size?)
>
> I think noop is a good choice for secondary domains, as you don't
> want to be too clever there, otherwise you stack a clever scheduler
> on top of a clever scheduler. noop basically only does front- and
> backmerging to make the request sizes larger.
>
> But you probably should initialize the readahead sectors.
>
> Please test attached patch.

This should help the case where one is doing buffered IO (so readahead gets 
used) but for o_direct, I still think we will have a problem.  On Dom0, I can 
drive 58MB/sec with sequential read with o_direct with just a 32k request 
size, but on domU with the same request size I can only get ~6MB/sec.  I am 
still wondering is somthing is up with the backend driver.  It apperas that 
the backend driver only submits requests to the actual device every 10ms. 
With a much larger request size (for o_direct) or a large readahead, 10ms is 
often enough to keep the disk streaming data.  With smaller request sizes or 
small read ahaad, the disk just doesn't read effciently.  

-Andrew

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: poor domU VBD performance.
  2005-03-29 22:59   ` Andrew Theurer
@ 2005-03-29 23:19     ` Kurt Garloff
  2005-03-29 23:26       ` Andrew Theurer
  0 siblings, 1 reply; 60+ messages in thread
From: Kurt Garloff @ 2005-03-29 23:19 UTC (permalink / raw)
  To: Andrew Theurer
  Cc: Ian Pratt, Christian Limpach, Xen development list, Jens Axboe,
	Vincent Hanquez


[-- Attachment #1.1: Type: text/plain, Size: 1117 bytes --]

Hi Andrew,

On Tue, Mar 29, 2005 at 04:59:18PM -0600, Andrew Theurer wrote:
> On Tuesday 29 March 2005 16:45, Kurt Garloff wrote:
> > Please test attached patch.
> 
> This should help the case where one is doing buffered IO (so readahead gets 
> used) but for o_direct, I still think we will have a problem.  On Dom0, I can 
> drive 58MB/sec with sequential read with o_direct with just a 32k request 
> size, but on domU with the same request size I can only get ~6MB/sec.

I can't reproduce this.
Does this depend on whether your domU root is a loopback mounted file
or a real partition/LVM device?

> I am still wondering is somthing is up with the backend driver.  It
> apperas that the backend driver only submits requests to the actual
> device every 10ms.  With a much larger request size (for o_direct) or
> a large readahead, 10ms is often enough to keep the disk streaming
> data.  With smaller request sizes or small read ahaad, the disk just
> doesn't read effciently.  

We might have a problem with unplugging then.

Regards,
-- 
Kurt Garloff, Director SUSE Labs, Novell Inc.

[-- Attachment #1.2: Type: application/pgp-signature, Size: 189 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: poor domU VBD performance.
  2005-03-29 23:19     ` Kurt Garloff
@ 2005-03-29 23:26       ` Andrew Theurer
  0 siblings, 0 replies; 60+ messages in thread
From: Andrew Theurer @ 2005-03-29 23:26 UTC (permalink / raw)
  To: Kurt Garloff
  Cc: Ian Pratt, Christian Limpach, Xen development list, Jens Axboe,
	Vincent Hanquez

On Tuesday 29 March 2005 17:19, Kurt Garloff wrote:
> Hi Andrew,
>
> On Tue, Mar 29, 2005 at 04:59:18PM -0600, Andrew Theurer wrote:
> > On Tuesday 29 March 2005 16:45, Kurt Garloff wrote:
> > > Please test attached patch.
> >
> > This should help the case where one is doing buffered IO (so readahead
> > gets used) but for o_direct, I still think we will have a problem.  On
> > Dom0, I can drive 58MB/sec with sequential read with o_direct with just a
> > 32k request size, but on domU with the same request size I can only get
> > ~6MB/sec.
>
> I can't reproduce this.
> Does this depend on whether your domU root is a loopback mounted file
> or a real partition/LVM device?

I am not sure.  What program are you using for o_direct reads?  I use a real 
LVM device for domU root and then another whole disk for the read tests.

> > I am still wondering is somthing is up with the backend driver.  It
> > apperas that the backend driver only submits requests to the actual
> > device every 10ms.  With a much larger request size (for o_direct) or
> > a large readahead, 10ms is often enough to keep the disk streaming
> > data.  With smaller request sizes or small read ahaad, the disk just
> > doesn't read effciently.
>
> We might have a problem with unplugging then.

That's what I suspect, but I do not know the driver code well enough to say 
for sure.

-Andrew

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: poor domU VBD performance.
  2005-03-30 11:16 RE: " Ian Pratt
@ 2005-03-30 17:01 ` peter bier
  2005-03-31  7:05 ` RE: " Jens Axboe
  1 sibling, 0 replies; 60+ messages in thread
From: peter bier @ 2005-03-30 17:01 UTC (permalink / raw)
  To: xen-devel

Ian Pratt <m+Ian.Pratt <at> cl.cam.ac.uk> writes:

> 
> > I'll check the xen block driver to see if there's anything 
> > else that sticks out.
> >
> > Jens Axboe
> 
> Jens, I'd really appreciate this.
> 
> The blkfront/blkback drivers have rather evolved over time, and I don't
> think any of the core team fully understand the block-layer differences
> between 2.4 and 2.6. 
> 
> There's also some junk left in there from when the backend was in Xen
> itself back in the days of 1.2, though Vincent has prepared a patch to
> clean this up and also make 'refreshing' of vbd's work (for size
> changes), and also allow the blkfront driver to import whole disks
> rather than paritions. We had this functionality on 2.4, but lost it in
> the move to 2.6.
> 
> My bet is that it's the 2.6 backend that is where the true perofrmance
> bug lies. Using a 2.6 domU blkfront talking to a 2.4 dom0 blkback seems
> to give good performance under a wide variety of circumstances. Using a
> 2.6 dom0 is far more pernickety. I agree with Andrew that I suspect it's
> the work queue changes are biting us when we don't have many outstanding
> requests.
> 
> Thanks,
> Ian
> 

I have done my simple dd on hde1 with two different setting of readahead:
256 sectors and 512 sectors.

These are the results:

DOM0 readahead 512s

Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s    wkB/s avgrq-
sz avgqu-sz   await  svctm  %util
hde        115055.40   2.00 592.40  0.80 115647.80   22.40 57823.90    11.20   
194.99     2.30    3.88   1.68  99.80
hda          0.00   0.00  0.00  0.00    0.00    0.00     0.00     0.00     
0.00     0.00    0.00   0.00   0.00

avg-cpu:  %user   %nice %system %iowait   %idle
           0.20    0.00   31.60   14.20   54.00

 DOMU  readahead 512s

Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s    wkB/s avgrq-
sz avgqu-sz   await  svctm  %util
hda1         0.00   0.20  0.00  0.00    0.00    3.20     0.00     1.60     
0.00     0.00    0.00   0.00   0.00
hde1       102301.40   0.00 11571.00  0.00 113868.80    0.00 56934.40     
0.00     9.84    68.45    5.92   0.09 100.00

avg-cpu:  %user   %nice %system %iowait   %idle
           0.00    0.00   35.00   65.00    0.00

DOM0 readahead 256s

Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s    wkB/s avgrq-
sz avgqu-sz   await  svctm  %util
hde        28289.20   1.80 126.80  0.40 28416.00   17.60 14208.00     8.80   
223.53     1.06    8.32   7.85  99.80
hda          0.00   0.00  0.00  0.00    0.00    0.00     0.00     0.00     
0.00     0.00    0.00   0.00   0.00

avg-cpu:  %user   %nice %system %iowait   %idle
           0.20    0.00    1.60    5.60   92.60

DOMU readahead 256s

Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s    wkB/s avgrq-
sz avgqu-sz   await  svctm  %util
hda1         0.00   0.20  0.00  0.40    0.00    4.80     0.00     2.40    
12.00     0.00    0.00   0.00   0.00
hde1       25085.60   0.00 3330.40  0.00 28416.00    0.00 14208.00     
0.00     8.53    30.54    9.17   0.30 100.00

avg-cpu:  %user   %nice %system %iowait   %idle
           0.20    0.00    1.40   98.40    0.00

What surprises me is that the service time for the request in DOM0 decreases
dramatically when readahead is increased from 256 to 512 sectors. If the output
of iostat is reliable, it tells me requests in DOMU are assembled to about 8  
to 10 sectors in size, while DOM0 puts them together to about 200 or even more
sectors 
Using readahead of 256 sectors results in a an average queuesize of anout 1
while changing readahead to 512 sectors results in an avaerage queuesize of 
slightly above 2 on DOM0. Service times in DOM0 and readahead 256 sectors 
seem to be in the range of the typical seek time of a modern ide disk while 
it is significantly lower with readahead of 512 sectors. 
As I have mentioned, this is the system with only one installed disk; this re-
sults in the write activity on the disk. The two write request per second
go into a different partition and those result in four required seeks per 
second. This should not be a reason for all requests to take about seek time
as service time. 

I have done a number of further test on various systems. In most cases I failed
to achieve service times below 8 msecs in Dom0; the only counterexample is 
reported above. It seems to me, that at low readahead values the amount of
data requested for from disk is simply the readahead amount of data. This 
request takes about seek time and thus I get lower performance when I work
with small readahead values.
What I do not understand at all is why throughput collapses with large 
readahead 
sizes. 

I found in mm/readahead.c that the readahead size for a file is updated if 
the readahead is not efficient. I suspect that the mechanism might lead to 
readahed being switched of for this file.
With readahead being set to 2048 sectors, the product of avgq-sz and avgrq-sz
reported by drops to 4 to 5 physical pages.

Peter 

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: poor domU VBD performance.
  2005-03-31  7:10   ` Jens Axboe
@ 2005-03-31  8:17     ` Keir Fraser
  2005-03-31  8:19       ` Jens Axboe
  0 siblings, 1 reply; 60+ messages in thread
From: Keir Fraser @ 2005-03-31  8:17 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Ian Pratt, Xen development list, Kurt Garloff, Vincent Hanquez,
	Christian Limpach

On 31 Mar 2005, at 08:10, Jens Axboe wrote:

> Here is a temporary work-around, this should bring you close to 100%
> performance at the cost of some extra unplugs. Uncompiled.

Yep, this does the job for me. Thanks! Avoiding the extra unplugs is 
harder than it sounds as each request in a batch may go to a different 
request queue. To minimise the number of unplugs per batch we'd need to 
add code to remember which queues we had used in the current batch, 
then kick them at the end of the batch. Is there likely to be any 
measurable benefit from reducing the number of unplugs?

  -- Keir

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: poor domU VBD performance.
  2005-03-31  8:17     ` Keir Fraser
@ 2005-03-31  8:19       ` Jens Axboe
  2005-03-31 14:33         ` Philip R Auld
  0 siblings, 1 reply; 60+ messages in thread
From: Jens Axboe @ 2005-03-31  8:19 UTC (permalink / raw)
  To: Keir Fraser
  Cc: Ian Pratt, Xen development list, Kurt Garloff, Vincent Hanquez,
	Christian Limpach

On Thu, Mar 31 2005, Keir Fraser wrote:
> 
> On 31 Mar 2005, at 08:10, Jens Axboe wrote:
> 
> >Here is a temporary work-around, this should bring you close to 100%
> >performance at the cost of some extra unplugs. Uncompiled.
> 
> Yep, this does the job for me. Thanks! Avoiding the extra unplugs is 
> harder than it sounds as each request in a batch may go to a different 
> request queue. To minimise the number of unplugs per batch we'd need to 
> add code to remember which queues we had used in the current batch, 
> then kick them at the end of the batch. Is there likely to be any 

Or just keep track of the previous queue, if that has changed unplug the
previous queue and update previous queue variable.

> measurable benefit from reducing the number of unplugs?

Probably not, since the plugging happened at the front end as well. So
you should get a nice stream of io in any way.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: poor domU VBD performance.
  2005-03-31  8:19       ` Jens Axboe
@ 2005-03-31 14:33         ` Philip R Auld
  2005-03-31 15:34           ` Kurt Garloff
  0 siblings, 1 reply; 60+ messages in thread
From: Philip R Auld @ 2005-03-31 14:33 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Ian Pratt, Xen development list, Kurt Garloff, Vincent Hanquez,
	Christian Limpach

Rumor has it that on Thu, Mar 31, 2005 at 10:19:01AM +0200 Jens Axboe said:
> On Thu, Mar 31 2005, Keir Fraser wrote:
> 
> > measurable benefit from reducing the number of unplugs?
> 
> Probably not, since the plugging happened at the front end as well. So
> you should get a nice stream of io in any way.

This effects merging though, right? I don't think the the front
end has done any merging. 

Also the BIO_RW_SYNC bit is sometimes ignored in __make_request
due to the bad queue locking interactions with scsi_request_fn.

The bio can be completed before the bio_sync() test in 
__make_request. Since there is no other reference to the bio it 
can be freed and reused by the time it is tested for BIO_RW_SYNC.

Cheers,

Phil


> 
> -- 
> Jens Axboe
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

-- 
Philip R. Auld, Ph.D.  	        	       Egenera, Inc.    
Software Architect                            165 Forest St.
(508) 858-2628                            Marlboro, MA 01752

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: poor domU VBD performance.
  2005-03-31 14:33         ` Philip R Auld
@ 2005-03-31 15:34           ` Kurt Garloff
  2005-03-31 15:39             ` Jens Axboe
  2005-03-31 16:53             ` Philip R Auld
  0 siblings, 2 replies; 60+ messages in thread
From: Kurt Garloff @ 2005-03-31 15:34 UTC (permalink / raw)
  To: Philip R Auld
  Cc: Ian Pratt, Xen development list, Vincent Hanquez, Jens Axboe,
	Christian Limpach


[-- Attachment #1.1: Type: text/plain, Size: 412 bytes --]

Hi,

On Thu, Mar 31, 2005 at 09:33:12AM -0500, Philip R Auld wrote:
> This effects merging though, right? I don't think the the front
> end has done any merging. 

The noop elevator does front and back merging.
My understanding is that it's used in the frontend driver.

Otherwise, unplugging on every block would indeed be quite bad ...

Regards,
-- 
Kurt Garloff, Director SUSE Labs, Novell Inc.

[-- Attachment #1.2: Type: application/pgp-signature, Size: 189 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: poor domU VBD performance.
  2005-03-31 15:34           ` Kurt Garloff
@ 2005-03-31 15:39             ` Jens Axboe
  2005-03-31 15:41               ` Jens Axboe
                                 ` (2 more replies)
  2005-03-31 16:53             ` Philip R Auld
  1 sibling, 3 replies; 60+ messages in thread
From: Jens Axboe @ 2005-03-31 15:39 UTC (permalink / raw)
  To: Kurt Garloff
  Cc: Ian Pratt, Philip R Auld, Xen development list, Vincent Hanquez,
	Christian Limpach

On Thu, Mar 31 2005, Kurt Garloff wrote:
> Hi,
> 
> On Thu, Mar 31, 2005 at 09:33:12AM -0500, Philip R Auld wrote:
> > This effects merging though, right? I don't think the the front
> > end has done any merging. 
> 
> The noop elevator does front and back merging.
> My understanding is that it's used in the frontend driver.
> 
> Otherwise, unplugging on every block would indeed be quite bad ...

Not necessarily - either your io rate is not fast enough to sustain a
substantial queue depth, in that case you get plugging on basically
every io anyways. If on the other hand the io rate is high enough to
maintain a queue depth of > 1, then the plugging will never take place
because the queue never empties.

So all in all, I don't think the temporary work-around will be such a
bad idea. I would still rather implement the queue tracking though, it
should not be more than a few lines of code.

And Philip, I will get the bio_sync() change merged :-)

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: poor domU VBD performance.
  2005-03-31 15:39             ` Jens Axboe
@ 2005-03-31 15:41               ` Jens Axboe
  2005-03-31 16:27                 ` Nivedita Singhvi
  2005-03-31 15:49               ` Keir Fraser
  2005-03-31 16:55               ` Philip R Auld
  2 siblings, 1 reply; 60+ messages in thread
From: Jens Axboe @ 2005-03-31 15:41 UTC (permalink / raw)
  To: Kurt Garloff
  Cc: Ian Pratt, Philip R Auld, Xen development list, Vincent Hanquez,
	Christian Limpach

On Thu, Mar 31 2005, Jens Axboe wrote:
> On Thu, Mar 31 2005, Kurt Garloff wrote:
> > Hi,
> > 
> > On Thu, Mar 31, 2005 at 09:33:12AM -0500, Philip R Auld wrote:
> > > This effects merging though, right? I don't think the the front
> > > end has done any merging. 
> > 
> > The noop elevator does front and back merging.
> > My understanding is that it's used in the frontend driver.
> > 
> > Otherwise, unplugging on every block would indeed be quite bad ...
> 
> Not necessarily - either your io rate is not fast enough to sustain a
> substantial queue depth, in that case you get plugging on basically
> every io anyways. If on the other hand the io rate is high enough to
> maintain a queue depth of > 1, then the plugging will never take place
> because the queue never empties.
> 
> So all in all, I don't think the temporary work-around will be such a
> bad idea. I would still rather implement the queue tracking though, it
> should not be more than a few lines of code.

There are still cases where it will be suboptimal of course, I didn't
intend to claim it will always be as fast as queue tracking! If you are
unlucky enough that the first request will reach the target device and
get started before the next one, you will have a small and a large part
of any given request executed. This isn't good for performance,
naturally. But queueing is so fast, I would be surprised if this
happened much in the real world.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: poor domU VBD performance.
  2005-03-31 15:39             ` Jens Axboe
  2005-03-31 15:41               ` Jens Axboe
@ 2005-03-31 15:49               ` Keir Fraser
  2005-03-31 16:02                 ` Andrew Theurer
  2005-03-31 17:44                 ` Jens Axboe
  2005-03-31 16:55               ` Philip R Auld
  2 siblings, 2 replies; 60+ messages in thread
From: Keir Fraser @ 2005-03-31 15:49 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Ian Pratt, Xen development list, Kurt Garloff, Philip R Auld,
	Vincent Hanquez, Christian Limpach


On 31 Mar 2005, at 16:39, Jens Axboe wrote:

> Not necessarily - either your io rate is not fast enough to sustain a
> substantial queue depth, in that case you get plugging on basically
> every io anyways. If on the other hand the io rate is high enough to
> maintain a queue depth of > 1, then the plugging will never take place
> because the queue never empties.
>
> So all in all, I don't think the temporary work-around will be such a
> bad idea. I would still rather implement the queue tracking though, it
> should not be more than a few lines of code.

I've checked in something along the lines of what you described into 
both the 2.0-testing and the unstable trees. Looks to have identical 
performance to the original simple patch, at least for a bulk 'dd'.

  -- Keir

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: poor domU VBD performance.
  2005-03-31 15:49               ` Keir Fraser
@ 2005-03-31 16:02                 ` Andrew Theurer
  2005-03-31 17:44                 ` Jens Axboe
  1 sibling, 0 replies; 60+ messages in thread
From: Andrew Theurer @ 2005-03-31 16:02 UTC (permalink / raw)
  To: Keir Fraser
  Cc: Ian Pratt, Philip R Auld, Kurt Garloff, Xen development list,
	Vincent Hanquez, Jens Axboe, Christian Limpach

Keir Fraser wrote:

>
> On 31 Mar 2005, at 16:39, Jens Axboe wrote:
>
>> Not necessarily - either your io rate is not fast enough to sustain a
>> substantial queue depth, in that case you get plugging on basically
>> every io anyways. If on the other hand the io rate is high enough to
>> maintain a queue depth of > 1, then the plugging will never take place
>> because the queue never empties.
>>
>> So all in all, I don't think the temporary work-around will be such a
>> bad idea. I would still rather implement the queue tracking though, it
>> should not be more than a few lines of code.
>
>
> I've checked in something along the lines of what you described into 
> both the 2.0-testing and the unstable trees. Looks to have identical 
> performance to the original simple patch, at least for a bulk 'dd'.

I'll do a pull of unstable and see what I get with o_direct, thanks.

-Andrew

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: poor domU VBD performance.
  2005-03-31 15:41               ` Jens Axboe
@ 2005-03-31 16:27                 ` Nivedita Singhvi
  2005-03-31 17:43                   ` Jens Axboe
  2005-03-31 18:27                   ` Kurt Garloff
  0 siblings, 2 replies; 60+ messages in thread
From: Nivedita Singhvi @ 2005-03-31 16:27 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Ian Pratt, Xen development list, Kurt Garloff, Philip R Auld,
	Vincent Hanquez, Christian Limpach

Jens Axboe wrote:

> There are still cases where it will be suboptimal of course, I didn't
> intend to claim it will always be as fast as queue tracking! If you are
> unlucky enough that the first request will reach the target device and
> get started before the next one, you will have a small and a large part
> of any given request executed. This isn't good for performance,
> naturally. But queueing is so fast, I would be surprised if this
> happened much in the real world.

Although the usual answer for what scheduling algorithm is
best is almost always "depends on the workload", it was
suggested to me that the cfq was still the best option to
go with. What do people feel about that? (Or is AS going
to remain default?).

Also, we're making the assumption here that guest OS = virtual
driver/device. I would rather we not make that assumption
always. This may be moot because I was also told there might
be a patch floating around (-mm ?) that allows you to
select scheduling algorithm on a per-device basis. Anyone
know if this is going to come in anytime soon?

thanks,
Nivedita

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: poor domU VBD performance.
  2005-03-31 15:34           ` Kurt Garloff
  2005-03-31 15:39             ` Jens Axboe
@ 2005-03-31 16:53             ` Philip R Auld
  2005-03-31 18:01               ` Jens Axboe
  1 sibling, 1 reply; 60+ messages in thread
From: Philip R Auld @ 2005-03-31 16:53 UTC (permalink / raw)
  To: Kurt Garloff
  Cc: Ian Pratt, Xen development list, Vincent Hanquez, Jens Axboe,
	Christian Limpach

Rumor has it that on Thu, Mar 31, 2005 at 05:34:49PM +0200 Kurt Garloff said:
> Hi,
> 
> On Thu, Mar 31, 2005 at 09:33:12AM -0500, Philip R Auld wrote:
> > This effects merging though, right? I don't think the the front
> > end has done any merging. 
> 
> The noop elevator does front and back merging.
> My understanding is that it's used in the frontend driver.

If that is the case, it can only merge things that are 
machine contiguous. Current guests know this mapping, but 
can they get this when running unmodified with VT-x.

My experience showed very little if any multipage 
IO coming out of the front end.

> 
> Otherwise, unplugging on every block would indeed be quite bad ...

Seems to be somewhat moot anyway given the curent change planned :)

Cheers,

Phil
> 
> Regards,
> -- 
> Kurt Garloff, Director SUSE Labs, Novell Inc.



-- 
Philip R. Auld, Ph.D.  	        	       Egenera, Inc.    
Software Architect                            165 Forest St.
(508) 858-2628                            Marlboro, MA 01752

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: poor domU VBD performance.
  2005-03-31 15:39             ` Jens Axboe
  2005-03-31 15:41               ` Jens Axboe
  2005-03-31 15:49               ` Keir Fraser
@ 2005-03-31 16:55               ` Philip R Auld
  2 siblings, 0 replies; 60+ messages in thread
From: Philip R Auld @ 2005-03-31 16:55 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Ian Pratt, Xen development list, Kurt Garloff, Vincent Hanquez,
	Christian Limpach

Rumor has it that on Thu, Mar 31, 2005 at 05:39:26PM +0200 Jens Axboe said:
> 
> And Philip, I will get the bio_sync() change merged :-)


Thanks! It's good to be transparent ;)



Phil

> 
> -- 
> Jens Axboe

-- 
Philip R. Auld, Ph.D.  	        	       Egenera, Inc.    
Software Architect                            165 Forest St.
(508) 858-2628                            Marlboro, MA 01752

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: poor domU VBD performance.
  2005-03-31 16:27                 ` Nivedita Singhvi
@ 2005-03-31 17:43                   ` Jens Axboe
  2005-03-31 18:27                   ` Kurt Garloff
  1 sibling, 0 replies; 60+ messages in thread
From: Jens Axboe @ 2005-03-31 17:43 UTC (permalink / raw)
  To: Nivedita Singhvi
  Cc: Ian Pratt, Xen development list, Kurt Garloff, Philip R Auld,
	Vincent Hanquez, Christian Limpach

On Thu, Mar 31 2005, Nivedita Singhvi wrote:
> Jens Axboe wrote:
> 
> >There are still cases where it will be suboptimal of course, I didn't
> >intend to claim it will always be as fast as queue tracking! If you are
> >unlucky enough that the first request will reach the target device and
> >get started before the next one, you will have a small and a large part
> >of any given request executed. This isn't good for performance,
> >naturally. But queueing is so fast, I would be surprised if this
> >happened much in the real world.
> 
> Although the usual answer for what scheduling algorithm is
> best is almost always "depends on the workload", it was
> suggested to me that the cfq was still the best option to
> go with. What do people feel about that? (Or is AS going
> to remain default?).

Really the only one that you should not use is AS, anything else will be
fine. AS should only ever be used at the bottom of the stack, if on a
single spindle backing. CFQ will be fine, as will deadline and noop.

> Also, we're making the assumption here that guest OS = virtual
> driver/device. I would rather we not make that assumption
> always. This may be moot because I was also told there might
> be a patch floating around (-mm ?) that allows you to
> select scheduling algorithm on a per-device basis. Anyone
> know if this is going to come in anytime soon?

That patch is in mainline since 2.6.10. You can change schedulers by
echoing the preferred scheduler to /sys/block/<device>/queue/scheduler -
reading that file will show you what schedulers are available.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: poor domU VBD performance.
  2005-03-31 15:49               ` Keir Fraser
  2005-03-31 16:02                 ` Andrew Theurer
@ 2005-03-31 17:44                 ` Jens Axboe
  1 sibling, 0 replies; 60+ messages in thread
From: Jens Axboe @ 2005-03-31 17:44 UTC (permalink / raw)
  To: Keir Fraser
  Cc: Ian Pratt, Xen development list, Kurt Garloff, Philip R Auld,
	Vincent Hanquez, Christian Limpach

On Thu, Mar 31 2005, Keir Fraser wrote:
> 
> On 31 Mar 2005, at 16:39, Jens Axboe wrote:
> 
> >Not necessarily - either your io rate is not fast enough to sustain a
> >substantial queue depth, in that case you get plugging on basically
> >every io anyways. If on the other hand the io rate is high enough to
> >maintain a queue depth of > 1, then the plugging will never take place
> >because the queue never empties.
> >
> >So all in all, I don't think the temporary work-around will be such a
> >bad idea. I would still rather implement the queue tracking though, it
> >should not be more than a few lines of code.
> 
> I've checked in something along the lines of what you described into 
> both the 2.0-testing and the unstable trees. Looks to have identical 
> performance to the original simple patch, at least for a bulk 'dd'.

Can you post the patch here for review? Or just point me somewhere I can
view it.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 60+ messages in thread

* RE: poor domU VBD performance.
@ 2005-03-31 17:55 Ian Pratt
  2005-03-31 18:04 ` Jens Axboe
                   ` (2 more replies)
  0 siblings, 3 replies; 60+ messages in thread
From: Ian Pratt @ 2005-03-31 17:55 UTC (permalink / raw)
  To: Jens Axboe, Keir Fraser
  Cc: Philip R Auld, Kurt Garloff, Xen development list,
	Vincent Hanquez, Christian Limpach


> > I've checked in something along the lines of what you 
> described into 
> > both the 2.0-testing and the unstable trees. Looks to have 
> identical 
> > performance to the original simple patch, at least for a bulk 'dd'.
> 
> Can you post the patch here for review? Or just point me 
> somewhere I can view it.

Jens,

Thanks for your help on this.

Here's Keirs updated patch:
http://xen.bkbits.net:8080/xen-2.0-testing.bk/gnupatch@424c1abd7LgWMiask
LEEAAX7ffdkXQ

Which is based on this earlier patch from you:
http://xen.bkbits.net:8080/xen-2.0-testing.bk/gnupatch@424bba4091aV1FuNk
sY_4w_z4Tvr3g


Best,
Ian

diff -Naru a/linux-2.6.11-xen-sparse/drivers/xen/blkback/blkback.c
b/linux-2.6.11-xen-sparse/drivers/xen/blkback/blkback.c
--- a/linux-2.6.11-xen-sparse/drivers/xen/blkback/blkback.c2005-03-31
09:52:27 -08:00
+++ b/linux-2.6.11-xen-sparse/drivers/xen/blkback/blkback.c2005-03-31
09:52:27 -08:00
@@ -481,7 +481,6 @@
     for ( i = 0; i < nr_psegs; i++ )
     {
         struct bio *bio;
-        struct bio_vec *bv;
 
         bio = bio_alloc(GFP_ATOMIC, 1);
         if ( unlikely(bio == NULL) )
@@ -494,17 +493,14 @@
         bio->bi_private = pending_req;
         bio->bi_end_io  = end_block_io_op;
         bio->bi_sector  = phys_seg[i].sector_number;
-        bio->bi_rw      = operation;
 
-        bv = bio_iovec_idx(bio, 0);
-        bv->bv_page   = virt_to_page(MMAP_VADDR(pending_idx, i));
-        bv->bv_len    = phys_seg[i].nr_sects << 9;
-        bv->bv_offset = phys_seg[i].buffer & ~PAGE_MASK;
+        bio_add_page(
+            bio,
+            virt_to_page(MMAP_VADDR(pending_idx, i)),
+            phys_seg[i].nr_sects << 9,
+            phys_seg[i].buffer & ~PAGE_MASK);
 
-        bio->bi_size    = bv->bv_len;
-        bio->bi_vcnt++;
-
-        submit_bio(operation, bio);
+        submit_bio(operation | (1 << BIO_RW_SYNC), bio);
     }
 #endif
 
# This is a BitKeeper generated diff -Nru style patch.
#
# ChangeSet
#   2005/03/31 09:52:16+01:00 kaf24@firebug.cl.cam.ac.uk 
#   Backport of Jens blkdev performance patch. I accidentally applied it
#   first to unstable.
# 
# linux-2.6.11-xen-sparse/drivers/xen/blkback/blkback.c
#   2005/03/31 09:52:15+01:00 kaf24@firebug.cl.cam.ac.uk +6 -10
#   Backport of Jens blkdev performance patch. I accidentally applied it
#   first to unstable.
# 

diff -Naru a/linux-2.6.11-xen-sparse/drivers/xen/blkback/blkback.c
b/linux-2.6.11-xen-sparse/drivers/xen/blkback/blkback.c
--- a/linux-2.6.11-xen-sparse/drivers/xen/blkback/blkback.c2005-03-31
09:54:46 -08:00
+++ b/linux-2.6.11-xen-sparse/drivers/xen/blkback/blkback.c2005-03-31
09:54:46 -08:00
@@ -66,6 +66,19 @@
 
 #if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,0)
 static kmem_cache_t *buffer_head_cachep;
+#else
+static request_queue_t *plugged_queue;
+void bdev_put(struct block_device *bdev)
+{
+    request_queue_t *q = plugged_queue;
+    /* We might be giving up last reference to plugged queue. Flush if
so. */
+    if ( (q != NULL) &&
+         (q == bdev_get_queue(bdev)) && 
+         (cmpxchg(&plugged_queue, q, NULL) == q) )
+        blk_run_queue(q);
+    /* It's now safe to drop the block device. */
+    blkdev_put(bdev);
+}
 #endif
 
 static int do_block_io_op(blkif_t *blkif, int max_to_do);
@@ -176,9 +189,15 @@
             blkif_put(blkif);
         }
 
-#if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,0)
         /* Push the batch through to disc. */
+#if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,0)
         run_task_queue(&tq_disk);
+#else
+        if ( plugged_queue != NULL )
+        {
+            blk_run_queue(plugged_queue);
+            plugged_queue = NULL;
+        }
 #endif
     }
 }
@@ -481,6 +500,7 @@
     for ( i = 0; i < nr_psegs; i++ )
     {
         struct bio *bio;
+        request_queue_t *q;
 
         bio = bio_alloc(GFP_ATOMIC, 1);
         if ( unlikely(bio == NULL) )
@@ -500,7 +520,14 @@
             phys_seg[i].nr_sects << 9,
             phys_seg[i].buffer & ~PAGE_MASK);
 
-        submit_bio(operation | (1 << BIO_RW_SYNC), bio);
+        if ( (q = bdev_get_queue(bio->bi_bdev)) != plugged_queue )
+        {
+            if ( plugged_queue != NULL )
+                blk_run_queue(plugged_queue);
+            plugged_queue = q;
+        }
+
+        submit_bio(operation, bio);
     }
 #endif
 
diff -Naru a/linux-2.6.11-xen-sparse/drivers/xen/blkback/common.h
b/linux-2.6.11-xen-sparse/drivers/xen/blkback/common.h
--- a/linux-2.6.11-xen-sparse/drivers/xen/blkback/common.h2005-03-31
09:54:46 -08:00
+++ b/linux-2.6.11-xen-sparse/drivers/xen/blkback/common.h2005-03-31
09:54:46 -08:00
@@ -30,8 +30,10 @@
 #if LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,0)
 typedef struct rb_root rb_root_t;
 typedef struct rb_node rb_node_t;
+extern void bdev_put(struct block_device *bdev);
 #else
 struct block_device;
+#define bdev_put(_b) ((void)0)
 #endif
 
 typedef struct blkif_st {
diff -Naru a/linux-2.6.11-xen-sparse/drivers/xen/blkback/vbd.c
b/linux-2.6.11-xen-sparse/drivers/xen/blkback/vbd.c
--- a/linux-2.6.11-xen-sparse/drivers/xen/blkback/vbd.c2005-03-31
09:54:46 -08:00
+++ b/linux-2.6.11-xen-sparse/drivers/xen/blkback/vbd.c2005-03-31
09:54:46 -08:00
@@ -150,7 +150,7 @@
     {
         DPRINTK("vbd_grow: device %08x doesn't exist.\n",
x->extent.device);
         grow->status = BLKIF_BE_STATUS_EXTENT_NOT_FOUND;
-        blkdev_put(x->bdev);
+        bdev_put(x->bdev);
         goto out;
     }
 
@@ -255,7 +255,7 @@
     *px = x->next; /* ATOMIC: no need for vbd_lock. */
 
 #if LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,0)
-    blkdev_put(x->bdev);
+    bdev_put(x->bdev);
 #endif
     kfree(x);
 
@@ -307,7 +307,7 @@
     {
         t = x->next;
 #if LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,0)
-        blkdev_put(x->bdev);
+        bdev_put(x->bdev);
 #endif
         kfree(x);
         x = t;
@@ -335,7 +335,7 @@
         {
             t = x->next;
 #if LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,0)
-            blkdev_put(x->bdev);
+            bdev_put(x->bdev);
 #endif
             kfree(x);
             x = t;
# This is a BitKeeper generated diff -Nru style patch.
#
# ChangeSet
#   2005/03/31 16:43:57+01:00 kaf24@firebug.cl.cam.ac.uk 
#   Backport of batched request_queue unplugging in blkback driver.
#   Signed-off-by: Keir Fraser <keir@xensource.com>
# 
# linux-2.6.11-xen-sparse/drivers/xen/blkback/blkback.c
#   2005/03/31 16:43:56+01:00 kaf24@firebug.cl.cam.ac.uk +29 -2
#   Backport of batched request_queue unplugging in blkback driver.
#   Signed-off-by: Keir Fraser <keir@xensource.com>
# 
# linux-2.6.11-xen-sparse/drivers/xen/blkback/common.h
#   2005/03/31 16:43:56+01:00 kaf24@firebug.cl.cam.ac.uk +2 -0
#   Backport of batched request_queue unplugging in blkback driver.
#   Signed-off-by: Keir Fraser <keir@xensource.com>
# 
# linux-2.6.11-xen-sparse/drivers/xen/blkback/vbd.c
#   2005/03/31 16:43:56+01:00 kaf24@firebug.cl.cam.ac.uk +4 -4
#   Backport of batched request_queue unplugging in blkback driver.
#   Signed-off-by: Keir Fraser <keir@xensource.com>
# 

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: poor domU VBD performance.
  2005-03-31 16:53             ` Philip R Auld
@ 2005-03-31 18:01               ` Jens Axboe
  2005-03-31 18:43                 ` Philip R Auld
  0 siblings, 1 reply; 60+ messages in thread
From: Jens Axboe @ 2005-03-31 18:01 UTC (permalink / raw)
  To: Philip R Auld
  Cc: Ian Pratt, Xen development list, Kurt Garloff, Vincent Hanquez,
	Christian Limpach

On Thu, Mar 31 2005, Philip R Auld wrote:
> > > This effects merging though, right? I don't think the the front
> > > end has done any merging. 
> > 
> > The noop elevator does front and back merging.
> > My understanding is that it's used in the frontend driver.
> 
> If that is the case, it can only merge things that are 
> machine contiguous. Current guests know this mapping, but 
> can they get this when running unmodified with VT-x.
> 
> My experience showed very little if any multipage 
> IO coming out of the front end.

There aren't that many users of multipage ios yet. direct io will use
it, ext2 will as well. iirc, -mm has patches for ext3 too. so it's
definitely improving :-)

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: poor domU VBD performance.
  2005-03-31 17:55 poor domU VBD performance Ian Pratt
@ 2005-03-31 18:04 ` Jens Axboe
  2005-03-31 18:57   ` Keir Fraser
  2005-03-31 20:49 ` Andrew Theurer
  2005-04-01 16:36 ` peter bier
  2 siblings, 1 reply; 60+ messages in thread
From: Jens Axboe @ 2005-03-31 18:04 UTC (permalink / raw)
  To: Ian Pratt
  Cc: Xen development list, Kurt Garloff, Philip R Auld,
	Vincent Hanquez, Christian Limpach

On Thu, Mar 31 2005, Ian Pratt wrote:
> 
> > > I've checked in something along the lines of what you 
> > described into 
> > > both the 2.0-testing and the unstable trees. Looks to have 
> > identical 
> > > performance to the original simple patch, at least for a bulk 'dd'.
> > 
> > Can you post the patch here for review? Or just point me 
> > somewhere I can view it.
> 
> Jens,
> 
> Thanks for your help on this.
> 
> Here's Keirs updated patch:
> http://xen.bkbits.net:8080/xen-2.0-testing.bk/gnupatch@424c1abd7LgWMiask
> LEEAAX7ffdkXQ
> 
> Which is based on this earlier patch from you:
> http://xen.bkbits.net:8080/xen-2.0-testing.bk/gnupatch@424bba4091aV1FuNk
> sY_4w_z4Tvr3g

I cannot immediately see if you call bdev_put() right after queueing the
io? If so, I think the patch looks fine. If not, you are missing the
last unplug :-)

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: poor domU VBD performance.
  2005-03-31 16:27                 ` Nivedita Singhvi
  2005-03-31 17:43                   ` Jens Axboe
@ 2005-03-31 18:27                   ` Kurt Garloff
  2005-03-31 21:59                     ` Nivedita Singhvi
  1 sibling, 1 reply; 60+ messages in thread
From: Kurt Garloff @ 2005-03-31 18:27 UTC (permalink / raw)
  To: Nivedita Singhvi
  Cc: Ian Pratt, Philip R Auld, Kurt Garloff, Xen development list,
	Vincent Hanquez, Jens Axboe, Christian Limpach


[-- Attachment #1.1: Type: text/plain, Size: 1339 bytes --]

Hi Niv,

On Thu, Mar 31, 2005 at 08:27:30AM -0800, Nivedita Singhvi wrote:
> Although the usual answer for what scheduling algorithm is
> best is almost always "depends on the workload", it was
> suggested to me that the cfq was still the best option to
> go with. What do people feel about that? (Or is AS going
> to remain default?).

This is a different dicussion.
But, yes, I would agree that CFQ (v3) is the best default choice.

Jens, should we maybe make sure that the blockback driver does use 
different (fake) UIDs for the domains that it serves to provide 
the fairness between them. Next step would be to allow to tweak 
IO priorities. Or, to make it more general, add a parameter (call
it uid), that a block driver can pass down to the IO scheduler
and that would normally be current->uid but may be set differently?

> Also, we're making the assumption here that guest OS = virtual
> driver/device. I would rather we not make that assumption
> always. This may be moot because I was also told there might
> be a patch floating around (-mm ?) that allows you to
> select scheduling algorithm on a per-device basis. Anyone

It's part of 2.6.11.
garloff@tpkurt:~ [0]$ cat /sys/block/hda/queue/scheduler
noop anticipatory deadline [cfq]

Regards,
-- 
Kurt Garloff, Director SUSE Labs, Novell Inc.

[-- Attachment #1.2: Type: application/pgp-signature, Size: 189 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: poor domU VBD performance.
  2005-03-31 18:01               ` Jens Axboe
@ 2005-03-31 18:43                 ` Philip R Auld
  2005-03-31 19:07                   ` Keir Fraser
  2005-03-31 19:21                   ` Jens Axboe
  0 siblings, 2 replies; 60+ messages in thread
From: Philip R Auld @ 2005-03-31 18:43 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Ian Pratt, Xen development list, Kurt Garloff, Vincent Hanquez,
	Christian Limpach

Rumor has it that on Thu, Mar 31, 2005 at 08:01:52PM +0200 Jens Axboe said:
> On Thu, Mar 31 2005, Philip R Auld wrote:
> > 
> > My experience showed very little if any multipage 
> > IO coming out of the front end.
> 
> There aren't that many users of multipage ios yet. direct io will use
> it, ext2 will as well. iirc, -mm has patches for ext3 too. so it's
> definitely improving :-)

Sorry, I was being sloppy with terminology :)

What I was getting at was that the backend  will split requests
up and issue each physical segment as a separate bio  (at least in 
the 2.0.5 tree I have in front of me). And that none of these 
physical segments was more that 1 page. 

So the request merging in the back end OS is important, no?


Cheers,

Phil

> 
> -- 
> Jens Axboe

-- 
Philip R. Auld, Ph.D.  	        	       Egenera, Inc.    
Software Architect                            165 Forest St.
(508) 858-2628                            Marlboro, MA 01752

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: poor domU VBD performance.
  2005-03-31 18:04 ` Jens Axboe
@ 2005-03-31 18:57   ` Keir Fraser
  2005-03-31 19:22     ` Jens Axboe
  0 siblings, 1 reply; 60+ messages in thread
From: Keir Fraser @ 2005-03-31 18:57 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Ian Pratt, Xen development list, Kurt Garloff, Philip R Auld,
	Vincent Hanquez, Christian Limpach


On 31 Mar 2005, at 19:04, Jens Axboe wrote:

> I cannot immediately see if you call bdev_put() right after queueing 
> the
> io? If so, I think the patch looks fine. If not, you are missing the
> last unplug :-)

That's not the job of bdev_put(): the final unplug is done at the end 
of blkio_schedule -- the same place that I do a run_task_queue() when 
compling for Linux 2.4.

  Cheers,
  Keir

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: poor domU VBD performance.
  2005-03-31 18:43                 ` Philip R Auld
@ 2005-03-31 19:07                   ` Keir Fraser
  2005-03-31 19:10                     ` Keir Fraser
  2005-03-31 19:20                     ` Jens Axboe
  2005-03-31 19:21                   ` Jens Axboe
  1 sibling, 2 replies; 60+ messages in thread
From: Keir Fraser @ 2005-03-31 19:07 UTC (permalink / raw)
  To: Philip R Auld
  Cc: Ian Pratt, Xen development list, Kurt Garloff, Vincent Hanquez,
	Jens Axboe, Christian Limpach

> What I was getting at was that the backend  will split requests
> up and issue each physical segment as a separate bio  (at least in
> the 2.0.5 tree I have in front of me). And that none of these
> physical segments was more that 1 page.
>
> So the request merging in the back end OS is important, no?

Ah, this reminds me I have one more question for Jens.

Since all the bio's that I queue up in a single invocation of 
dispatch_rw_block_io() will actually be adjacent to each other (because 
they're all from the same scatter-gather list) can I actually do 
something like (very roughly):

bio = bio_alloc(GFP_KERNEL, nr_psegs);
for ( i = 0; i < nr_psegs; i++ )
    bio_add_page(bio, blah...);
submit_bio(operation, bio);

Each of the biovecs that I queue may not be a full page in size (but 
won't straddle a page boundary of course).

This would avoid the bio's having to be merged again later.

  -- Keir

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: poor domU VBD performance.
  2005-03-31 19:07                   ` Keir Fraser
@ 2005-03-31 19:10                     ` Keir Fraser
  2005-03-31 19:20                     ` Jens Axboe
  1 sibling, 0 replies; 60+ messages in thread
From: Keir Fraser @ 2005-03-31 19:10 UTC (permalink / raw)
  To: Keir Fraser
  Cc: Ian Pratt, Xen development list, Kurt Garloff, Philip R Auld,
	Vincent Hanquez, Jens Axboe, Christian Limpach


On 31 Mar 2005, at 20:07, Keir Fraser wrote:

> Since all the bio's that I queue up in a single invocation of 
> dispatch_rw_block_io() will actually be adjacent to each other 
> (because they're all from the same scatter-gather list)

I should add: I know that the code makes it look like each s-g element 
might map somewhere entirely different from the previous one, but we no 
longer support that mode of operation. Each VBD now always maps onto a 
single, entire block device or partition.

  -- Keir

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: poor domU VBD performance.
  2005-03-31 19:07                   ` Keir Fraser
  2005-03-31 19:10                     ` Keir Fraser
@ 2005-03-31 19:20                     ` Jens Axboe
  1 sibling, 0 replies; 60+ messages in thread
From: Jens Axboe @ 2005-03-31 19:20 UTC (permalink / raw)
  To: Keir Fraser
  Cc: Ian Pratt, Philip R Auld, Kurt Garloff, Xen development list,
	Vincent Hanquez, Christian Limpach

On Thu, Mar 31 2005, Keir Fraser wrote:
> >What I was getting at was that the backend  will split requests
> >up and issue each physical segment as a separate bio  (at least in
> >the 2.0.5 tree I have in front of me). And that none of these
> >physical segments was more that 1 page.
> >
> >So the request merging in the back end OS is important, no?
> 
> Ah, this reminds me I have one more question for Jens.
> 
> Since all the bio's that I queue up in a single invocation of 
> dispatch_rw_block_io() will actually be adjacent to each other (because 
> they're all from the same scatter-gather list) can I actually do 
> something like (very roughly):
> 
> bio = bio_alloc(GFP_KERNEL, nr_psegs);
> for ( i = 0; i < nr_psegs; i++ )
>    bio_add_page(bio, blah...);
> submit_bio(operation, bio);
> 
> Each of the biovecs that I queue may not be a full page in size (but 
> won't straddle a page boundary of course).

Yes, this is precisely what you should do, the current method is pretty
suboptimal. Basically allocate a bio with nr_psegs, and call
bio_add_page() for each page until it returns _less_ than the number of
bytes you requested. When it does that, submit that bio for io and
allocate a new bio with nr_psegs-submitted_segs bio_vecs attached.
Continue until you are done.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: poor domU VBD performance.
  2005-03-31 18:43                 ` Philip R Auld
  2005-03-31 19:07                   ` Keir Fraser
@ 2005-03-31 19:21                   ` Jens Axboe
  1 sibling, 0 replies; 60+ messages in thread
From: Jens Axboe @ 2005-03-31 19:21 UTC (permalink / raw)
  To: Philip R Auld
  Cc: Ian Pratt, Xen development list, Kurt Garloff, Vincent Hanquez,
	Christian Limpach

On Thu, Mar 31 2005, Philip R Auld wrote:
> Rumor has it that on Thu, Mar 31, 2005 at 08:01:52PM +0200 Jens Axboe said:
> > On Thu, Mar 31 2005, Philip R Auld wrote:
> > > 
> > > My experience showed very little if any multipage 
> > > IO coming out of the front end.
> > 
> > There aren't that many users of multipage ios yet. direct io will use
> > it, ext2 will as well. iirc, -mm has patches for ext3 too. so it's
> > definitely improving :-)
> 
> Sorry, I was being sloppy with terminology :)
> 
> What I was getting at was that the backend  will split requests
> up and issue each physical segment as a separate bio  (at least in 
> the 2.0.5 tree I have in front of me). And that none of these 
> physical segments was more that 1 page. 
> 
> So the request merging in the back end OS is important, no?

I suppose it always is, since the merge criteria may have changed from
when the io was initially queued. If requests are always split into
single pages, then it becomes very important to merge at the backend.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: poor domU VBD performance.
  2005-03-31 18:57   ` Keir Fraser
@ 2005-03-31 19:22     ` Jens Axboe
  0 siblings, 0 replies; 60+ messages in thread
From: Jens Axboe @ 2005-03-31 19:22 UTC (permalink / raw)
  To: Keir Fraser
  Cc: Ian Pratt, Xen development list, Kurt Garloff, Philip R Auld,
	Vincent Hanquez, Christian Limpach

On Thu, Mar 31 2005, Keir Fraser wrote:
> 
> On 31 Mar 2005, at 19:04, Jens Axboe wrote:
> 
> >I cannot immediately see if you call bdev_put() right after queueing 
> >the
> >io? If so, I think the patch looks fine. If not, you are missing the
> >last unplug :-)
> 
> That's not the job of bdev_put(): the final unplug is done at the end 
> of blkio_schedule -- the same place that I do a run_task_queue() when 
> compling for Linux 2.4.

Thanks for confirming, that sounds fine.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: poor domU VBD performance.
  2005-03-31 17:55 poor domU VBD performance Ian Pratt
  2005-03-31 18:04 ` Jens Axboe
@ 2005-03-31 20:49 ` Andrew Theurer
  2005-03-31 21:15   ` Keir Fraser
  2005-04-01 16:36 ` peter bier
  2 siblings, 1 reply; 60+ messages in thread
From: Andrew Theurer @ 2005-03-31 20:49 UTC (permalink / raw)
  To: Ian Pratt, Jens Axboe, Keir Fraser
  Cc: Xen development list, Vincent Hanquez, Philip R Auld,
	Kurt Garloff, Christian Limpach

On Thursday 31 March 2005 11:55, Ian Pratt wrote:
> > > I've checked in something along the lines of what you
> >
> > described into
> >
> > > both the 2.0-testing and the unstable trees. Looks to have
> >
> > identical
> >
> > > performance to the original simple patch, at least for a bulk 'dd'.
> >
> > Can you post the patch here for review? Or just point me
> > somewhere I can view it.
>
> Jens,
>
> Thanks for your help on this.

BTW, I am now getting this with xen-unstable:

Process xenblkd (pid: 730, threadinfo=f7cc4000 task=f7c42510)
Stack: c022d172 f44b1a08 f363c6f0 f7cc4000 c046d40c c02849f8 f44b1a08 00000010
       00000000 f7c42510 c0115b0a 00000000 00000000 f7c42510 c17f1e48 c01092e6
       00000000 f7c42510 c0115b0a 00100100 00200200 00000000 00000000 00000000
Call Trace:
 [<c022d172>] blk_run_queue+0x38/0x91
 [<c02849f8>] blkio_schedule+0x126/0x149
 [<c0115b0a>] default_wake_function+0x0/0x12
 [<c01092e6>] ret_from_fork+0x6/0x1c
 [<c0115b0a>] default_wake_function+0x0/0x12
 [<c02848d2>] blkio_schedule+0x0/0x149
 [<c0107571>] kernel_thread_helper+0x5/0xb
Code:  Bad EIP value.
 <6>note: xenblkd[730] exited with preempt_count 1

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: poor domU VBD performance.
  2005-03-31 20:49 ` Andrew Theurer
@ 2005-03-31 21:15   ` Keir Fraser
  2005-03-31 21:27     ` Andrew Theurer
  2005-04-01  5:43     ` Jens Axboe
  0 siblings, 2 replies; 60+ messages in thread
From: Keir Fraser @ 2005-03-31 21:15 UTC (permalink / raw)
  To: Andrew Theurer
  Cc: Ian Pratt, Xen development list, Kurt Garloff, Philip R Auld,
	Vincent Hanquez, Jens Axboe, Christian Limpach


On 31 Mar 2005, at 21:49, Andrew Theurer wrote:

> BTW, I am now getting this with xen-unstable:
>
> Process xenblkd (pid: 730, threadinfo=f7cc4000 task=f7c42510)
> Stack: c022d172 f44b1a08 f363c6f0 f7cc4000 c046d40c c02849f8 f44b1a08 
> 00000010
>        00000000 f7c42510 c0115b0a 00000000 00000000 f7c42510 c17f1e48 
> c01092e6
>        00000000 f7c42510 c0115b0a 00100100 00200200 00000000 00000000 
> 00000000

I wonder if blk_run_queue() is not the right thing to call. For 
example, it ignores whether the queue has been forcibly stopped by the 
underlying driver and doesn't check whether there are any requests that 
actually require pushing. Plus various drivers (swraid and probably 
lvm) have their own unplug function and blk_run_queue doesn't handle 
that.

Could you try again, but replace calls to blk_run_queue(plugged_queue) 
in blkback.c with:
    if ( plugged_queue->unplug_fn )
           plugged_queue->unplug_fn(plugged_queue);

This looks like a better match with what various other drivers do (e.g. 
swraid).

  Thanks,
  Keir

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: poor domU VBD performance.
  2005-03-31 21:15   ` Keir Fraser
@ 2005-03-31 21:27     ` Andrew Theurer
  2005-04-01  5:43     ` Jens Axboe
  1 sibling, 0 replies; 60+ messages in thread
From: Andrew Theurer @ 2005-03-31 21:27 UTC (permalink / raw)
  To: Keir Fraser
  Cc: Ian Pratt, Xen development list, Kurt Garloff, Philip R Auld,
	Vincent Hanquez, Jens Axboe, Christian Limpach

> Could you try again, but replace calls to
> blk_run_queue(plugged_queue) in blkback.c with:
>     if ( plugged_queue->unplug_fn )
>            plugged_queue->unplug_fn(plugged_queue);
>
> This looks like a better match with what various other drivers do
> (e.g. swraid).

Sorry, It looks like did not get you the whole output, but I can try 
what you suggested.

Unable to handle kernel NULL pointer dereference at virtual address 
00000000
 printing eip:
00000000
*pde = ma 00000000 pa 55555000
 [<c02691f1>] blkio_schedule+0x126/0x14c
 [<c01147b5>] default_wake_function+0x0/0x12
 [<c01147b5>] default_wake_function+0x0/0x12
 [<c02690cb>] blkio_schedule+0x0/0x14c
 [<c01071f1>] kernel_thread_helper+0x5/0xb
Oops: 0000 [#1]
Modules linked in: ipt_MASQUERADE iptable_nat ip_conntrack ip_tables 
qla2300 qla2xxx scsi_transport_fc mptscsih mptbase
CPU:    0
EIP:    0061:[<00000000>]    Not tainted VLI
EFLAGS: 00010282   (2.6.11-xen0-up)
EIP is at 0x0
eax: 00000000   ebx: f5ec9b70   ecx: f3e13b64   edx: 00000000
esi: 00000000   edi: c044540c   ebp: f7d8dfc0   esp: f7d8df84
ds: 007b   es: 007b   ss: 0069
Process xenblkd (pid: 730, threadinfo=f7d8c000 task=f7d43020)
Stack: c0217f7d f5ec9b70 f3fee6f0 f7d8c000 c02691f1 f5ec9b70 00000010 
00000000
       f7d43020 c01147b5 00000000 00000000 fbffc000 00000000 f7d8c000 
00000000
       f7d43020 c01147b5 00100100 00200200 00000000 00000000 00000000 
c02690cb
Call Trace:
 [<c0217f7d>] blk_run_queue+0x24/0x47
 [<c02691f1>] blkio_schedule+0x126/0x14c
 [<c01147b5>] default_wake_function+0x0/0x12
 [<c01147b5>] default_wake_function+0x0/0x12
 [<c02690cb>] blkio_schedule+0x0/0x14c
 [<c01071f1>] kernel_thread_helper+0x5/0xb
Code:  Bad EIP value.

>
>   Thanks,
>   Keir

^ permalink raw reply	[flat|nested] 60+ messages in thread

* RE: poor domU VBD performance.
@ 2005-03-31 21:32 Ian Pratt
  2005-03-31 22:13 ` Andrew Theurer
  0 siblings, 1 reply; 60+ messages in thread
From: Ian Pratt @ 2005-03-31 21:32 UTC (permalink / raw)
  To: Keir Fraser, Andrew Theurer
  Cc: Philip R Auld, Kurt Garloff, Xen development list,
	Vincent Hanquez, Jens Axboe, Christian Limpach

 > Could you try again, but replace calls to 
> blk_run_queue(plugged_queue) in blkback.c with:
>     if ( plugged_queue->unplug_fn )
>            plugged_queue->unplug_fn(plugged_queue);
> 
> This looks like a better match with what various other 
> drivers do (e.g. 
> swraid).

This patch is required to make it work with LVM. 2.0-testing and
unstable will be updated shortly...

Ian 

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: poor domU VBD performance.
  2005-03-31 18:27                   ` Kurt Garloff
@ 2005-03-31 21:59                     ` Nivedita Singhvi
  0 siblings, 0 replies; 60+ messages in thread
From: Nivedita Singhvi @ 2005-03-31 21:59 UTC (permalink / raw)
  To: Kurt Garloff
  Cc: Ian Pratt, Xen development list, Philip R Auld, Vincent Hanquez,
	Jens Axboe, Christian Limpach

Kurt Garloff wrote:

> Hi Niv,
> 
> On Thu, Mar 31, 2005 at 08:27:30AM -0800, Nivedita Singhvi wrote:
> 
>>Although the usual answer for what scheduling algorithm is
>>best is almost always "depends on the workload", it was
>>suggested to me that the cfq was still the best option to
>>go with. What do people feel about that? (Or is AS going
>>to remain default?).
> 
> 
> This is a different dicussion.

Yes, I did change the subject a little ;).

> But, yes, I would agree that CFQ (v3) is the best default choice.

Yep, even though some of the complications in the Xen
environment (as you point out below) will have to be addressed.

> Jens, should we maybe make sure that the blockback driver does use 
> different (fake) UIDs for the domains that it serves to provide 
> the fairness between them. Next step would be to allow to tweak 
> IO priorities. Or, to make it more general, add a parameter (call
> it uid), that a block driver can pass down to the IO scheduler
> and that would normally be current->uid but may be set differently?


> It's part of 2.6.11.
> garloff@tpkurt:~ [0]$ cat /sys/block/hda/queue/scheduler
> noop anticipatory deadline [cfq]

I just saw Jens' reply as well. This is much goodness :).
Very handy indeed!

thanks,
Nivedita

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: poor domU VBD performance.
  2005-03-31 21:32 Ian Pratt
@ 2005-03-31 22:13 ` Andrew Theurer
  0 siblings, 0 replies; 60+ messages in thread
From: Andrew Theurer @ 2005-03-31 22:13 UTC (permalink / raw)
  To: Ian Pratt, Keir Fraser
  Cc: Xen development list, Kurt Garloff, Philip R Auld,
	Vincent Hanquez, Jens Axboe, Christian Limpach

On Thursday 31 March 2005 15:32, Ian Pratt wrote:
>  > Could you try again, but replace calls to
> >
> > blk_run_queue(plugged_queue) in blkback.c with:
> >     if ( plugged_queue->unplug_fn )
> >            plugged_queue->unplug_fn(plugged_queue);
> >
> > This looks like a better match with what various other
> > drivers do (e.g.
> > swraid).

OK, changes worked for me, but still have some min latency here (but 
much better)

      reqsze    MB/sec    svcmt

xenU    16k     6266.67   1.25
        32k    12618.67   1.20 
        64k    25002.67   1.28
       128k    49322.67   1.35
       256k    58538.67   3.15

xen0    16k    13818.67   1.15
        32k    27573.33   1.16
        64k    54784.00   1.16
       128k    58581.33   2.18
       256k    58453.33   4.38

noXen   16k    58679.19   0.27
	32k    58453.33   0.54
	64k    58713.04   1.08
       128k    58174.09   2.17
       256k    58820.07   4.36




	

^ permalink raw reply	[flat|nested] 60+ messages in thread

* RE: poor domU VBD performance.
@ 2005-03-31 22:36 Ian Pratt
  2005-03-31 23:05 ` Andrew Theurer
  0 siblings, 1 reply; 60+ messages in thread
From: Ian Pratt @ 2005-03-31 22:36 UTC (permalink / raw)
  To: Andrew Theurer, Keir Fraser
  Cc: Xen development list, Kurt Garloff, Philip R Auld,
	Vincent Hanquez, Jens Axboe, Christian Limpach

>       reqsze    MB/sec    svcmt
> 
> xenU    16k     6266.67   1.25
>         32k    12618.67   1.20 
>         64k    25002.67   1.28
>        128k    49322.67   1.35
>        256k    58538.67   3.15
> 
> xen0    16k    13818.67   1.15
>         32k    27573.33   1.16
>         64k    54784.00   1.16
>        128k    58581.33   2.18
>        256k    58453.33   4.38
> 
> noXen   16k    58679.19   0.27
> 	32k    58453.33   0.54
> 	64k    58713.04   1.08
>        128k    58174.09   2.17
>        256k    58820.07   4.36

These figures for xen0 are interesting. It's odd that we tail off so
badly for short requests. What interrupt rates are occuring when you do
these tests?

Thanks,
ian

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: poor domU VBD performance.
  2005-03-31 22:36 Ian Pratt
@ 2005-03-31 23:05 ` Andrew Theurer
  2005-04-01 21:40   ` Cédric Schieli
  0 siblings, 1 reply; 60+ messages in thread
From: Andrew Theurer @ 2005-03-31 23:05 UTC (permalink / raw)
  To: Ian Pratt, Keir Fraser
  Cc: Xen development list, Kurt Garloff, Philip R Auld,
	Vincent Hanquez, Jens Axboe, Christian Limpach

On Thursday 31 March 2005 16:36, Ian Pratt wrote:
> >       reqsze    MB/sec    svcmt
> >
> > xenU    16k     6266.67   1.25
> >         32k    12618.67   1.20
> >         64k    25002.67   1.28
> >        128k    49322.67   1.35
> >        256k    58538.67   3.15
> >
> > xen0    16k    13818.67   1.15
> >         32k    27573.33   1.16
> >         64k    54784.00   1.16
> >        128k    58581.33   2.18
> >        256k    58453.33   4.38
> >
> > noXen   16k    58679.19   0.27
> > 	32k    58453.33   0.54
> > 	64k    58713.04   1.08
> >        128k    58174.09   2.17
> >        256k    58820.07   4.36
>
> These figures for xen0 are interesting. It's odd that we tail off so
> badly for short requests. What interrupt rates are occuring when you
> do these tests?

I just ran again, and for some reason it looks fine now...  I have no 
idea what I did to get the lower numbers initially, perhaps an 
inadvertant IO scheduler change.  Service commit times are .28ms and I 
can drive ~58MB/sec with just 16k requests on xen0.  I'll do some more 
tests to get a more consistent picture.

-Andrew

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: poor domU VBD performance.
  2005-03-31 21:15   ` Keir Fraser
  2005-03-31 21:27     ` Andrew Theurer
@ 2005-04-01  5:43     ` Jens Axboe
  1 sibling, 0 replies; 60+ messages in thread
From: Jens Axboe @ 2005-04-01  5:43 UTC (permalink / raw)
  To: Keir Fraser
  Cc: Ian Pratt, Xen development list, Kurt Garloff, Andrew Theurer,
	Philip R Auld, Vincent Hanquez, Christian Limpach

On Thu, Mar 31 2005, Keir Fraser wrote:
> 
> On 31 Mar 2005, at 21:49, Andrew Theurer wrote:
> 
> >BTW, I am now getting this with xen-unstable:
> >
> >Process xenblkd (pid: 730, threadinfo=f7cc4000 task=f7c42510)
> >Stack: c022d172 f44b1a08 f363c6f0 f7cc4000 c046d40c c02849f8 f44b1a08 
> >00000010
> >       00000000 f7c42510 c0115b0a 00000000 00000000 f7c42510 c17f1e48 
> >c01092e6
> >       00000000 f7c42510 c0115b0a 00100100 00200200 00000000 00000000 
> >00000000
> 
> I wonder if blk_run_queue() is not the right thing to call. For 
> example, it ignores whether the queue has been forcibly stopped by the 
> underlying driver and doesn't check whether there are any requests that 
> actually require pushing. Plus various drivers (swraid and probably 
> lvm) have their own unplug function and blk_run_queue doesn't handle 
> that.
> 
> Could you try again, but replace calls to blk_run_queue(plugged_queue) 
> in blkback.c with:
>    if ( plugged_queue->unplug_fn )
>           plugged_queue->unplug_fn(plugged_queue);
> 
> This looks like a better match with what various other drivers do (e.g. 
> swraid).

Yes you are right, you really want to just unplug it. That should work
correctly in all cases. Remember that ->unplug_fn must not be called
with any locks called.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: poor domU VBD performance.
  2005-03-31 17:55 poor domU VBD performance Ian Pratt
  2005-03-31 18:04 ` Jens Axboe
  2005-03-31 20:49 ` Andrew Theurer
@ 2005-04-01 16:36 ` peter bier
  2 siblings, 0 replies; 60+ messages in thread
From: peter bier @ 2005-04-01 16:36 UTC (permalink / raw)
  To: xen-devel

Ian Pratt <m+Ian.Pratt <at> cl.cam.ac.uk> writes:

> 
> > > I've checked in something along the lines of what you 
> > described into 
> > > both the 2.0-testing and the unstable trees. Looks to have 
> > identical 
> > > performance to the original simple patch, at least for a bulk 'dd'.
> > 
> > Can you post the patch here for review? Or just point me 
> > somewhere I can view it.
> 
> Jens,
> 
> Thanks for your help on this.
> 
> Here's Keirs updated patch:
> http://xen.bkbits.net:8080/xen-2.0-testing.bk/gnupatch <at> 424c1abd7LgWMiask
> LEEAAX7ffdkXQ
> 
> Which is based on this earlier patch from you:
> http://xen.bkbits.net:8080/xen-2.0-testing.bk/gnupatch <at> 424bba4091aV1FuNk
> sY_4w_z4Tvr3g
> 
> Best,
> Ian
> 
I have applied the patch in blkback.c for xen0 and have gotten good results 
now.
I have tested two systems one with a standard ide disk device and another with
two SATA disks. I stumbled over this issue when I was doing filesystem io and
wanted to check the efficiency of xen-linux. It was then that I went to raw IO
on block devices and found that it didn't perform as I hoped. 

Now I have switched back to the filesystem operations. I do this by copying a
"/usr" subtree from a slackware-10.0 installation containg about 750 MB in 
2200 directories and 37000 files. Copying these  files with target directory on
the same device as the source directory, I get between 90 and 93% of the per-
formance in Dom0, when I work with DomU. When copying form a directory on one
device into a directory of another device, performance in DomU leaks more 
behind
that of Dom0. It's only 50 to 60 percent of the Dom0 performance. The 
performance is  less than it is when using only one disk. I found out
that the sum of the business of the two disks as reported by iostat on Dom0 is
always slightly above 100%.  Does this reflect that the reading and the
writing both  go through the VDB driver ? Both devices are never 100 % busy.

Any explanations ? 

Thanks in advance 

   Peter

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: poor domU VBD performance.
  2005-03-31 23:05 ` Andrew Theurer
@ 2005-04-01 21:40   ` Cédric Schieli
  0 siblings, 0 replies; 60+ messages in thread
From: Cédric Schieli @ 2005-04-01 21:40 UTC (permalink / raw)
  To: Andrew Theurer
  Cc: Ian Pratt, Xen development list, Kurt Garloff, Philip R Auld,
	Vincent Hanquez, Jens Axboe, Christian Limpach


> I just ran again, and for some reason it looks fine now...  I have no 
> idea what I did to get the lower numbers initially, perhaps an 
> inadvertant IO scheduler change.  Service commit times are .28ms and I 
> can drive ~58MB/sec with just 16k requests on xen0.  I'll do some more 
> tests to get a more consistent picture.
> 

I still experience bad performance in domU with latest xen-testing dom0.

Here's my setup :

Xen : 2.0.5
Dom0 : 2.6.11-xen-testing (20050401 ~22h CEST) running Debian Sarge
DomU : 2.6.10-xen-2.0.5 (8G LVM backed VBDs exported as hda1) running
Gentoo
Processor : AthlonXP 1800+
Chipset : VIA KT600
Drive : Seagate ST380013AS 80G SATA

And my results :

Dom0 : 51 MB/s
DomU : 36 MB/s

I've tried with request sizes from 128k to 1024k reading entire volume
and obtained always same results.
Changing the scheduler on Dom0 and/or DomU doesn't change anything.

I can give you more info if nedded.

--
Cédric Schieli

^ permalink raw reply	[flat|nested] 60+ messages in thread

* RE: poor domU VBD performance.
@ 2005-04-01 23:22 Ian Pratt
  2005-04-02 10:36 ` Cédric Schieli
  0 siblings, 1 reply; 60+ messages in thread
From: Ian Pratt @ 2005-04-01 23:22 UTC (permalink / raw)
  To: Cédric Schieli, Andrew Theurer
  Cc: Xen development list, Kurt Garloff, Philip R Auld,
	Vincent Hanquez, Jens Axboe, Christian Limpach

 
There have been some changes to the frontend driver too: you might want to try using the 2.0-testing kernel in domU too.

Also, a really nasty CPU performance bug got fixed earlier this evening, so you should make sure you have the latest tree.

Ian

> > I just ran again, and for some reason it looks fine now...  
> I have no 
> > idea what I did to get the lower numbers initially, perhaps an 
> > inadvertant IO scheduler change.  Service commit times are 
> .28ms and I 
> > can drive ~58MB/sec with just 16k requests on xen0.  I'll 
> do some more 
> > tests to get a more consistent picture.
> > 
> 
> I still experience bad performance in domU with latest 
> xen-testing dom0.
> 
> Here's my setup :
> 
> Xen : 2.0.5
> Dom0 : 2.6.11-xen-testing (20050401 ~22h CEST) running Debian 
> Sarge DomU : 2.6.10-xen-2.0.5 (8G LVM backed VBDs exported as 
> hda1) running Gentoo Processor : AthlonXP 1800+ Chipset : VIA 
> KT600 Drive : Seagate ST380013AS 80G SATA
> 
> And my results :
> 
> Dom0 : 51 MB/s
> DomU : 36 MB/s
> 
> I've tried with request sizes from 128k to 1024k reading 
> entire volume and obtained always same results.
> Changing the scheduler on Dom0 and/or DomU doesn't change anything.
> 
> I can give you more info if nedded.
> 
> --
> Cédric Schieli
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
> 

^ permalink raw reply	[flat|nested] 60+ messages in thread

* RE: poor domU VBD performance.
  2005-04-01 23:22 Ian Pratt
@ 2005-04-02 10:36 ` Cédric Schieli
  2005-04-02 19:54   ` peter bier
  0 siblings, 1 reply; 60+ messages in thread
From: Cédric Schieli @ 2005-04-02 10:36 UTC (permalink / raw)
  To: Ian Pratt
  Cc: Xen development list, Kurt Garloff, Andrew Theurer, Philip R Auld,
	Vincent Hanquez, Jens Axboe, Christian Limpach

I've just tried with latest testing tree DomO and DomU and got same
results.

Le samedi 02 avril 2005 à 00:22 +0100, Ian Pratt a écrit :
>  There have been some changes to the frontend driver too: you might want to try using the 2.0-testing kernel in domU too.
> 
> Also, a really nasty CPU performance bug got fixed earlier this evening, so you should make sure you have the latest tree.
> 
> Ian
> 
> > > I just ran again, and for some reason it looks fine now...  
> > I have no 
> > > idea what I did to get the lower numbers initially, perhaps an 
> > > inadvertant IO scheduler change.  Service commit times are 
> > .28ms and I 
> > > can drive ~58MB/sec with just 16k requests on xen0.  I'll 
> > do some more 
> > > tests to get a more consistent picture.
> > > 
> > 
> > I still experience bad performance in domU with latest 
> > xen-testing dom0.
> > 
> > Here's my setup :
> > 
> > Xen : 2.0.5
> > Dom0 : 2.6.11-xen-testing (20050401 ~22h CEST) running Debian 
> > Sarge DomU : 2.6.10-xen-2.0.5 (8G LVM backed VBDs exported as 
> > hda1) running Gentoo Processor : AthlonXP 1800+ Chipset : VIA 
> > KT600 Drive : Seagate ST380013AS 80G SATA
> > 
> > And my results :
> > 
> > Dom0 : 51 MB/s
> > DomU : 36 MB/s
> > 
> > I've tried with request sizes from 128k to 1024k reading 
> > entire volume and obtained always same results.
> > Changing the scheduler on Dom0 and/or DomU doesn't change anything.
> > 
> > I can give you more info if nedded.
> > 
> > --
> > Cédric Schieli
> > 
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xensource.com
> > http://lists.xensource.com/xen-devel
> > 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 60+ messages in thread

* RE: poor domU VBD performance.
@ 2005-04-02 10:56 Ian Pratt
  2005-04-02 12:10 ` Cédric Schieli
  0 siblings, 1 reply; 60+ messages in thread
From: Ian Pratt @ 2005-04-02 10:56 UTC (permalink / raw)
  To: Cédric Schieli
  Cc: Xen development list, Kurt Garloff, Andrew Theurer, Philip R Auld,
	Vincent Hanquez, Jens Axboe, Christian Limpach


> > > Xen : 2.0.5
> > > Dom0 : 2.6.11-xen-testing (20050401 ~22h CEST) running 
> Debian Sarge 
> > > DomU : 2.6.10-xen-2.0.5 (8G LVM backed VBDs exported as
> > > hda1) running Gentoo Processor : AthlonXP 1800+ Chipset : 
> VIA KT600 
> > > Drive : Seagate ST380013AS 80G SATA
> > > 
> > > And my results :
> > > 
> > > Dom0 : 51 MB/s
> > > DomU : 36 MB/s
> > > 
> > > I've tried with request sizes from 128k to 1024k reading entire 
> > > volume and obtained always same results.
> > > Changing the scheduler on Dom0 and/or DomU doesn't change 
> anything.

Are you sure you're reading from the exact same part of the disk in both instances? 
How are you doing the bandwidth measurements? 'dd'?

Ian

^ permalink raw reply	[flat|nested] 60+ messages in thread

* RE: poor domU VBD performance.
  2005-04-02 10:56 Ian Pratt
@ 2005-04-02 12:10 ` Cédric Schieli
  0 siblings, 0 replies; 60+ messages in thread
From: Cédric Schieli @ 2005-04-02 12:10 UTC (permalink / raw)
  To: Ian Pratt
  Cc: Xen development list, Kurt Garloff, Andrew Theurer, Philip R Auld,
	Vincent Hanquez, Jens Axboe, Christian Limpach

> Are you sure you're reading from the exact same part of the disk in both instances? 
> How are you doing the bandwidth measurements? 'dd'?

I have this line in my DomU conf :
disk = [ 'phy:vg/gentoo-root,hda1,w' ,'phy:vg/gentoo-swap,hda2,w' ]

I make my measurements with :
Dom0 : dd if=/dev/vg/gentoo-root of=/dev/null bs={128|256|...}k
DomU : dd if=/dev/hda1 of=/dev/null bs={128|256|...}k

In all cases I get same results : 50-52 MB/s on Dom0, 34-37 MB/s on DomU
I've tried with any combination of scheduler.

I will try with latest xen-testing hypervisor (I still use 2.0.5 for the
moment) but I don't think this should impact a lot.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: poor domU VBD performance.
  2005-04-02 10:36 ` Cédric Schieli
@ 2005-04-02 19:54   ` peter bier
  0 siblings, 0 replies; 60+ messages in thread
From: peter bier @ 2005-04-02 19:54 UTC (permalink / raw)
  To: xen-devel

Cédric Schieli <cedric <at> schieli.dyndns.org> writes:

> 
> I've just tried with latest testing tree DomO and DomU and got same
> results.
> 
> Le samedi 02 avril 2005 à 00:22 +0100, Ian Pratt a écrit :
> >  There have been some changes to the frontend driver too: you might want to
try using the 2.0-testing kernel
> in domU too.
> > 
> > Also, a really nasty CPU performance bug got fixed earlier this evening, so
you should make sure you have
> the latest tree.
> > 
> > Ian
> > 
> > > > I just ran again, and for some reason it looks fine now...  
> > > I have no 
> > > > idea what I did to get the lower numbers initially, perhaps an 
> > > > inadvertant IO scheduler change.  Service commit times are 
> > > .28ms and I 
> > > > can drive ~58MB/sec with just 16k requests on xen0.  I'll 
> > > do some more 
> > > > tests to get a more consistent picture.
> > > > 
> > > 
> > > I still experience bad performance in domU with latest 
> > > xen-testing dom0.
> > > 
> > > Here's my setup :
> > > 
> > > Xen : 2.0.5
> > > Dom0 : 2.6.11-xen-testing (20050401 ~22h CEST) running Debian 
> > > Sarge DomU : 2.6.10-xen-2.0.5 (8G LVM backed VBDs exported as 
> > > hda1) running Gentoo Processor : AthlonXP 1800+ Chipset : VIA 
> > > KT600 Drive : Seagate ST380013AS 80G SATA
> > > 
> > > And my results :
> > > 
> > > Dom0 : 51 MB/s
> > > DomU : 36 MB/s
> > > 
> > > I've tried with request sizes from 128k to 1024k reading 
> > > entire volume and obtained always same results.
> > > Changing the scheduler on Dom0 and/or DomU doesn't change anything.
> > > 
> > > I can give you more info if nedded.
> > > 
> > > --
> > > Cédric Schieli
> > > 
> > > _______________________________________________
> > > Xen-devel mailing list
> > > Xen-devel <at> lists.xensource.com
> > > http://lists.xensource.com/xen-devel
> > > 
> > 
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel <at> lists.xensource.com
> > http://lists.xensource.com/xen-devel
> 

It just sumbled accross the fact, that you are using a SATA disk, Cédric. This
is 
exactly the "dd" behavior that  my system containing SATA disks still shows. But 
it
applies only to "dd" ( which, admittedly is read-only ). It does not apply to 
the performance figures I got when copying my "/usr" tree - as described in a 
previous post here - from one location of the disk to another location on the
same disk ( which, of course is combined read-write on the same device ). Hence
it might be possible that my limited performance copying from one disk to
another might in fact be an effect of reduced read performance in DomU on a 
SATA disk.  

I suspect that this might be an effect specific to SATA disks. I will verify 
this on monday - when I have access to my computers in the office, by doing it
on a system with two IDE disks. I will report it then, if your problem is still
open. 

I will describe the exact configuration of the systems then (Motherboard, IO 
Controller, etc ).

Peter 

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: poor domU VBD performance.
  2005-04-04 22:35 ` Nicholas Lee
@ 2005-04-12 10:29   ` peter bier
  0 siblings, 0 replies; 60+ messages in thread
From: peter bier @ 2005-04-12 10:29 UTC (permalink / raw)
  To: xen-devel

I am sorry to return to this issue after quite a long interruption. 
As I mentioned in a post before, I came accross this problem when I
was testing file-system performance. After the problems with raw sequential
I/O seemed to have been fixed in the testing release, I turned back to
my original problem. 
I did a simple test that dispite its simplicity seems to put the IO subsystem
under considerable stress. I took the /usr tree of my system and copied 
five it times into different directories on a slice of disk 1. This tree con-
sistst of 36000 files with about 750 MB of data. Then I started 
to copy each of these copies recursively onto disk 2 ( each to its own 
location on that disk, of course ). I ran these copying
in parallel and the processes took about 6 to 7 minutes in DOM0, while they
needed between 14.6 and 15.9 minutes in DOMU. 

Essentially, this means that using this heavy io load on the system I get 
back to my 40% ratio between io performance on DOMU compared and io perfor-
mance on DOM0 that I initially reported. This may just be coincidence, but
probably it is worth mention. 

I monitored the disk and block-io activity with iostat. The output of
both is too large to post it here, so I will only try to include a few 
representative lines of each. The first two lines show the activity while
doing the copying on DOMU.

This is a snapshot of a phase with relatively high throughput (DOMU):

Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s    wkB/s avgrq-
sz avgqu-sz   await  svctm  %util
hde          0.00 2748.00  1.60 71.20   12.80 22561.60     6.40 11280.80   
310.09     1.78   23.96   4.73  34.40
hdg        2571.00   5.00 126.80  9.60 21580.80  115.20 10790.40    57.60   
159.06     5.48   40.38   6.61  90.20

avg-cpu:  %user   %nice %system %iowait   %idle
           0.20    0.00    6.20    0.20   93.40

this is a snapshot of a phase with relatively low throughput (DOMU):

Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s    wkB/s avgrq-
sz avgqu-sz   await  svctm  %util
hde          0.00 676.40  0.00 33.00    0.00 5678.40     0.00  2839.20   
172.07     1.76   53.45   4.91  16.20
hdg        335.80  11.00 315.00  3.40 5206.40  115.20  2603.20    57.60    
16.71     4.15   13.02   2.76  87.80

avg-cpu:  %user   %nice %system %iowait   %idle
           0.20    0.00    9.00    0.00   90.80

_I suspect, that the reported iowait on cpu-usage is not entirely correct, but
I am not sure about it.

The next two lines are snapshots of iostat output during the copying in DOM0

again the first snapshot was taken in a phase of relative high throughput
(DOM0):

Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s    wkB/s avgrq-
sz avgqu-sz   await  svctm  %util
hde          0.00 5845.40  1.40 110.20   11.20 47812.80     5.60 23906.40   
428.53   105.96  772.63   8.96 100.00
hdg         46.20  24.80 389.80  2.20 47628.80  216.00 23814.40   108.00   
122.05     7.12   18.23   3.30 129.40

avg-cpu:  %user   %nice %system %iowait   %idle
           2.40    0.00   40.20   57.40    0.00

the next line was taken in a phase of relatively low throughput (DOM0):

Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s    wkB/s avgrq-
sz avgqu-sz   await  svctm  %util
hde          0.00 903.40  0.20 106.80    3.20 7972.80     1.60  3986.40    
74.54    20.77  217.91   4.06  43.40
hdg          0.00  24.00 746.60  1.20 9302.40  200.00  4651.20   100.00    
12.71     4.96    6.67   1.34 100.00

avg-cpu:  %user   %nice %system %iowait   %idle
           3.40    0.00   44.00   52.60    0.00

The problem seems to be the reading. The device hde, which contains the slice 
where the data is copied onto is almost never really busy when using DOMU.  
The ratio of kb/s written and usage seems to reflect that writing from DOMU
is just as efficient as writing from DOM0 ( writing can be buffered in 
both cases after all ). 
Yet the information on reading seems to show a different picture. Blockio
merges requests permanently resulting in request sizes that are approxi-
mately equal in both cases. Yet service times for DOMU requests are about
twice the time needed for requests for DOM0.

I do not know if such a scenario is simply inadequate for virtual systems at
least under Xen. We are thinking about running a mail gateway on top of a 
protected and secured dom0 system, and potentially offering other network
services in separate domains. We want to avoid corruption of DOM0 
while being able to offer "insecure" services in nonprivileged domains.
We know that mail servicing can potentially
put an intense load onto the filesystem - admittedly more on inodes ( create
and delete ) than with respect to data throughput.

Do I simply have to accept that under heavy io load domains using vbd to 
access storage devices will lag behind dom0 and native linux systems, or is
there a chance to fix this ?

My reported test was done on a fujitsu-siemens system RX100 with a 2.0 Ghz
Celeron CPU and a total of only 256 MB of memory. DOM0 had 128 MB and DOMU 
100 MB. The disks were simply ide disks. I did the same test on a System 
with 1.25 GB Ram with both domains having 0.5 GB of memory. It contains SATA
disks and the results are essentially the same the only difference is that both
processes are slower due to less throughput under random access from the disks.

Any advice ore help ?

Thanks in advance 

    Peter 

^ permalink raw reply	[flat|nested] 60+ messages in thread

end of thread, other threads:[~2005-04-12 10:29 UTC | newest]

Thread overview: 60+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-03-31 17:55 poor domU VBD performance Ian Pratt
2005-03-31 18:04 ` Jens Axboe
2005-03-31 18:57   ` Keir Fraser
2005-03-31 19:22     ` Jens Axboe
2005-03-31 20:49 ` Andrew Theurer
2005-03-31 21:15   ` Keir Fraser
2005-03-31 21:27     ` Andrew Theurer
2005-04-01  5:43     ` Jens Axboe
2005-04-01 16:36 ` peter bier
  -- strict thread matches above, loose matches on Subject: below --
2005-04-04 19:36 Ian Pratt
2005-04-04 22:35 ` Nicholas Lee
2005-04-12 10:29   ` peter bier
2005-04-02 10:56 Ian Pratt
2005-04-02 12:10 ` Cédric Schieli
2005-04-01 23:22 Ian Pratt
2005-04-02 10:36 ` Cédric Schieli
2005-04-02 19:54   ` peter bier
2005-03-31 22:36 Ian Pratt
2005-03-31 23:05 ` Andrew Theurer
2005-04-01 21:40   ` Cédric Schieli
2005-03-31 21:32 Ian Pratt
2005-03-31 22:13 ` Andrew Theurer
2005-03-30 11:16 RE: " Ian Pratt
2005-03-30 17:01 ` peter bier
2005-03-31  7:05 ` RE: " Jens Axboe
2005-03-31  7:10   ` Jens Axboe
2005-03-31  8:17     ` Keir Fraser
2005-03-31  8:19       ` Jens Axboe
2005-03-31 14:33         ` Philip R Auld
2005-03-31 15:34           ` Kurt Garloff
2005-03-31 15:39             ` Jens Axboe
2005-03-31 15:41               ` Jens Axboe
2005-03-31 16:27                 ` Nivedita Singhvi
2005-03-31 17:43                   ` Jens Axboe
2005-03-31 18:27                   ` Kurt Garloff
2005-03-31 21:59                     ` Nivedita Singhvi
2005-03-31 15:49               ` Keir Fraser
2005-03-31 16:02                 ` Andrew Theurer
2005-03-31 17:44                 ` Jens Axboe
2005-03-31 16:55               ` Philip R Auld
2005-03-31 16:53             ` Philip R Auld
2005-03-31 18:01               ` Jens Axboe
2005-03-31 18:43                 ` Philip R Auld
2005-03-31 19:07                   ` Keir Fraser
2005-03-31 19:10                     ` Keir Fraser
2005-03-31 19:20                     ` Jens Axboe
2005-03-31 19:21                   ` Jens Axboe
     [not found] <A95E2296287EAD4EB592B5DEEFCE0E9D1E3905@liverpoolst.ad.cl.cam.ac.uk>
2005-03-29 22:45 ` RE: " Kurt Garloff
2005-03-29 22:59   ` Andrew Theurer
2005-03-29 23:19     ` Kurt Garloff
2005-03-29 23:26       ` Andrew Theurer
2005-03-29 14:19 Ian Pratt
2005-03-29 15:27 ` peter bier
2005-03-28 22:17 Ian Pratt
2005-03-29  8:44 ` peter bier
2005-03-28 20:14 Ian Pratt
2005-03-28 20:18 ` Andrew Theurer
2005-03-28 21:48 ` Andrew Theurer
2005-03-28 23:38   ` Peter Bier
2005-03-29  0:27     ` Andrew Theurer
2005-03-29 11:39       ` peter bier
2005-03-28 18:55 Ian Pratt
2005-03-28 19:33 ` Andrew Theurer
2005-03-27 17:41 Ian Pratt
2005-03-28  8:48 ` peter bier
2005-03-28 12:44 ` peter bier
2005-03-29  6:20 ` Pasi Kärkkäinen
2005-03-26 18:14 Peter Bier

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.