Re: RE: RE: poor domU VBD performance.

All of lore.kernel.org
 help / color / mirror / Atom feed

* Re: RE: RE: poor domU VBD performance.
       [not found] <A95E2296287EAD4EB592B5DEEFCE0E9D1E3905@liverpoolst.ad.cl.cam.ac.uk>
@ 2005-03-29 22:45 ` Kurt Garloff
  2005-03-30  8:53   ` Jens Axboe
  2005-03-30 10:00   ` Kurt Garloff
  0 siblings, 2 replies; 29+ messages in thread
From: Kurt Garloff @ 2005-03-29 22:45 UTC (permalink / raw)
  To: Ian Pratt
  Cc: Xen development list, Vincent Hanquez, Jens Axboe,
	Christian Limpach


[-- Attachment #1.1.1: Type: text/plain, Size: 1604 bytes --]

Hi Ian,

On Tue, Mar 29, 2005 at 07:09:50PM +0100, Ian Pratt wrote:
> We'd really appreciate your help on this, or from someone else at SuSE
> who actually understands the Linux block layer?

I'm Cc'ing Jens ...
 
> In the 2.6 blkfront driver, what scheduler should we be registering
> with? What should we be setting as max_sectors? Are there other
> parameters we should be setting that we aren't? (block size?)

I think noop is a good choice for secondary domains, as you don't
want to be too clever there, otherwise you stack a clever scheduler
on top of a clever scheduler. noop basically only does front- and
backmerging to make the request sizes larger.

But you probably should initialize the readahead sectors.

Please test attached patch.

It fixed the problem for me, but my testing was very limited,
I only had a small loopback mounted root fs to test with quickly.

Note that initializing to 256 (128k) would be OK as well (and might 
be the better default); it seems to be set to 256 (128k) by default, 
but it's not ... If you explicitly set it to 256, the performance 
still increases tremendously.

> In the blkback driver that actually issues the IO's in dom0, is there
> something we should be doing to cause IOs to get batched? In 2.4 we used
> a task_queue to push the IO through to the disk having queued it with
> generic_make_request(). In 2.6 we're currently using submit_bio() and
> just hoping that batching happens.

I don't think the blkback driver does anything wrong here.

Regards,
-- 
Kurt Garloff, Director SUSE Labs, Novell Inc.

[-- Attachment #1.1.2: xen-blkfront-ra.diff --]
[-- Type: text/plain, Size: 840 bytes --]

From: Kurt Garloff <garloff@suse.de>
Subject: Initialize readahead in vbd Q init code

The domU read performance is poor without readahead, so
better make sure we initialize this value.

Signed-off-by: Kurt Garloff <garloff@suse.de>

Index: linux-2.6.11/drivers/xen/blkfront/vbd.c
===================================================================
--- linux-2.6.11.orig/drivers/xen/blkfront/vbd.c
+++ linux-2.6.11/drivers/xen/blkfront/vbd.c
@@ -268,8 +268,11 @@ static struct gendisk *xlvbd_get_gendisk
             xlbd_blk_queue, BLKIF_MAX_SEGMENTS_PER_REQUEST);
 
         /* Make sure buffer addresses are sector-aligned. */
         blk_queue_dma_alignment(xlbd_blk_queue, 511);
+
+	/* Set readahead */
+	blk_queue_max_sectors(xlbd_blk_queue, 512);
     }
     gd->queue = xlbd_blk_queue;
 
     add_disk(gd);

[-- Attachment #1.2: Type: application/pgp-signature, Size: 189 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: RE: RE: poor domU VBD performance.
  2005-03-29 22:45 ` RE: " Kurt Garloff
@ 2005-03-30  8:53   ` Jens Axboe
  2005-03-30 10:00   ` Kurt Garloff
  1 sibling, 0 replies; 29+ messages in thread
From: Jens Axboe @ 2005-03-30  8:53 UTC (permalink / raw)
  To: Kurt Garloff
  Cc: Ian Pratt, Xen development list, Vincent Hanquez,
	Christian Limpach

On Wed, Mar 30 2005, Kurt Garloff wrote:
> Hi Ian,
> 
> On Tue, Mar 29, 2005 at 07:09:50PM +0100, Ian Pratt wrote:
> > We'd really appreciate your help on this, or from someone else at SuSE
> > who actually understands the Linux block layer?
> 
> I'm Cc'ing Jens ...
>  
> > In the 2.6 blkfront driver, what scheduler should we be registering
> > with? What should we be setting as max_sectors? Are there other
> > parameters we should be setting that we aren't? (block size?)
> 
> I think noop is a good choice for secondary domains, as you don't
> want to be too clever there, otherwise you stack a clever scheduler
> on top of a clever scheduler. noop basically only does front- and
> backmerging to make the request sizes larger.
> 
> But you probably should initialize the readahead sectors.
> 
> Please test attached patch.
> 
> It fixed the problem for me, but my testing was very limited,
> I only had a small loopback mounted root fs to test with quickly.
> 
> Note that initializing to 256 (128k) would be OK as well (and might 
> be the better default); it seems to be set to 256 (128k) by default, 
> but it's not ... If you explicitly set it to 256, the performance 
> still increases tremendously.
> 
> > In the blkback driver that actually issues the IO's in dom0, is there
> > something we should be doing to cause IOs to get batched? In 2.4 we used
> > a task_queue to push the IO through to the disk having queued it with
> > generic_make_request(). In 2.6 we're currently using submit_bio() and
> > just hoping that batching happens.
> 
> I don't think the blkback driver does anything wrong here.
> 
> Regards,
> -- 
> Kurt Garloff, Director SUSE Labs, Novell Inc.

> From: Kurt Garloff <garloff@suse.de>
> Subject: Initialize readahead in vbd Q init code
> 
> The domU read performance is poor without readahead, so
> better make sure we initialize this value.
> 
> Signed-off-by: Kurt Garloff <garloff@suse.de>
> 
> Index: linux-2.6.11/drivers/xen/blkfront/vbd.c
> ===================================================================
> --- linux-2.6.11.orig/drivers/xen/blkfront/vbd.c
> +++ linux-2.6.11/drivers/xen/blkfront/vbd.c
> @@ -268,8 +268,11 @@ static struct gendisk *xlvbd_get_gendisk
>              xlbd_blk_queue, BLKIF_MAX_SEGMENTS_PER_REQUEST);
>  
>          /* Make sure buffer addresses are sector-aligned. */
>          blk_queue_dma_alignment(xlbd_blk_queue, 511);
> +
> +	/* Set readahead */
> +	blk_queue_max_sectors(xlbd_blk_queue, 512);

This isn't read-ahead, it's the max request size setting. The actual
read-ahead setting is in q->backing_dev_info.ra_pages.

There is a helper function for this type of stacking,
blk_queue_stack_limits(). You call it after setting up your own queue:

        blk_queue_stack_limits(my_queue, bottom_queue);

I'll check the xen block driver to see if there's anything else that
sticks out.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: RE: RE: poor domU VBD performance.
  2005-03-29 22:45 ` RE: " Kurt Garloff
  2005-03-30  8:53   ` Jens Axboe
@ 2005-03-30 10:00   ` Kurt Garloff
  1 sibling, 0 replies; 29+ messages in thread
From: Kurt Garloff @ 2005-03-30 10:00 UTC (permalink / raw)
  To: Xen development list
  Cc: Ian Pratt, Christian Limpach, Jens Axboe, Vincent Hanquez


[-- Attachment #1.1: Type: text/plain, Size: 271 bytes --]

On Wed, Mar 30, 2005 at 12:45:03AM +0200, Kurt Garloff wrote:
> Please test attached patch.

Delete it, blk_queue_max_sectors() is called a bit above.
Adding printk()s now to see what's going on there.

Regards,
-- 
Kurt Garloff, Director SUSE Labs, Novell Inc.

[-- Attachment #1.2: Type: application/pgp-signature, Size: 189 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: RE: RE: poor domU VBD performance.
@ 2005-03-30 11:16 Ian Pratt
  2005-03-30 17:01 ` peter bier
  2005-03-31  7:05 ` RE: " Jens Axboe
  0 siblings, 2 replies; 29+ messages in thread
From: Ian Pratt @ 2005-03-30 11:16 UTC (permalink / raw)
  To: Jens Axboe, Kurt Garloff
  Cc: Vincent Hanquez, Xen development list, Christian Limpach

> I'll check the xen block driver to see if there's anything 
> else that sticks out.
>
> Jens Axboe

Jens, I'd really appreciate this.

The blkfront/blkback drivers have rather evolved over time, and I don't
think any of the core team fully understand the block-layer differences
between 2.4 and 2.6. 

There's also some junk left in there from when the backend was in Xen
itself back in the days of 1.2, though Vincent has prepared a patch to
clean this up and also make 'refreshing' of vbd's work (for size
changes), and also allow the blkfront driver to import whole disks
rather than paritions. We had this functionality on 2.4, but lost it in
the move to 2.6.

My bet is that it's the 2.6 backend that is where the true perofrmance
bug lies. Using a 2.6 domU blkfront talking to a 2.4 dom0 blkback seems
to give good performance under a wide variety of circumstances. Using a
2.6 dom0 is far more pernickety. I agree with Andrew that I suspect it's
the work queue changes are biting us when we don't have many outstanding
requests.

Thanks,
Ian

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: poor domU VBD performance.
  2005-03-30 11:16 RE: RE: poor domU VBD performance Ian Pratt
@ 2005-03-30 17:01 ` peter bier
  2005-03-30 18:05   ` Andrew Theurer
  2005-03-31  7:05 ` RE: " Jens Axboe
  1 sibling, 1 reply; 29+ messages in thread
From: peter bier @ 2005-03-30 17:01 UTC (permalink / raw)
  To: xen-devel

Ian Pratt <m+Ian.Pratt <at> cl.cam.ac.uk> writes:

> 
> > I'll check the xen block driver to see if there's anything 
> > else that sticks out.
> >
> > Jens Axboe
> 
> Jens, I'd really appreciate this.
> 
> The blkfront/blkback drivers have rather evolved over time, and I don't
> think any of the core team fully understand the block-layer differences
> between 2.4 and 2.6. 
> 
> There's also some junk left in there from when the backend was in Xen
> itself back in the days of 1.2, though Vincent has prepared a patch to
> clean this up and also make 'refreshing' of vbd's work (for size
> changes), and also allow the blkfront driver to import whole disks
> rather than paritions. We had this functionality on 2.4, but lost it in
> the move to 2.6.
> 
> My bet is that it's the 2.6 backend that is where the true perofrmance
> bug lies. Using a 2.6 domU blkfront talking to a 2.4 dom0 blkback seems
> to give good performance under a wide variety of circumstances. Using a
> 2.6 dom0 is far more pernickety. I agree with Andrew that I suspect it's
> the work queue changes are biting us when we don't have many outstanding
> requests.
> 
> Thanks,
> Ian
> 

I have done my simple dd on hde1 with two different setting of readahead:
256 sectors and 512 sectors.

These are the results:

DOM0 readahead 512s

Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s    wkB/s avgrq-
sz avgqu-sz   await  svctm  %util
hde        115055.40   2.00 592.40  0.80 115647.80   22.40 57823.90    11.20   
194.99     2.30    3.88   1.68  99.80
hda          0.00   0.00  0.00  0.00    0.00    0.00     0.00     0.00     
0.00     0.00    0.00   0.00   0.00

avg-cpu:  %user   %nice %system %iowait   %idle
           0.20    0.00   31.60   14.20   54.00

 DOMU  readahead 512s

Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s    wkB/s avgrq-
sz avgqu-sz   await  svctm  %util
hda1         0.00   0.20  0.00  0.00    0.00    3.20     0.00     1.60     
0.00     0.00    0.00   0.00   0.00
hde1       102301.40   0.00 11571.00  0.00 113868.80    0.00 56934.40     
0.00     9.84    68.45    5.92   0.09 100.00

avg-cpu:  %user   %nice %system %iowait   %idle
           0.00    0.00   35.00   65.00    0.00

DOM0 readahead 256s

Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s    wkB/s avgrq-
sz avgqu-sz   await  svctm  %util
hde        28289.20   1.80 126.80  0.40 28416.00   17.60 14208.00     8.80   
223.53     1.06    8.32   7.85  99.80
hda          0.00   0.00  0.00  0.00    0.00    0.00     0.00     0.00     
0.00     0.00    0.00   0.00   0.00

avg-cpu:  %user   %nice %system %iowait   %idle
           0.20    0.00    1.60    5.60   92.60

DOMU readahead 256s

Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s    wkB/s avgrq-
sz avgqu-sz   await  svctm  %util
hda1         0.00   0.20  0.00  0.40    0.00    4.80     0.00     2.40    
12.00     0.00    0.00   0.00   0.00
hde1       25085.60   0.00 3330.40  0.00 28416.00    0.00 14208.00     
0.00     8.53    30.54    9.17   0.30 100.00

avg-cpu:  %user   %nice %system %iowait   %idle
           0.20    0.00    1.40   98.40    0.00

What surprises me is that the service time for the request in DOM0 decreases
dramatically when readahead is increased from 256 to 512 sectors. If the output
of iostat is reliable, it tells me requests in DOMU are assembled to about 8  
to 10 sectors in size, while DOM0 puts them together to about 200 or even more
sectors 
Using readahead of 256 sectors results in a an average queuesize of anout 1
while changing readahead to 512 sectors results in an avaerage queuesize of 
slightly above 2 on DOM0. Service times in DOM0 and readahead 256 sectors 
seem to be in the range of the typical seek time of a modern ide disk while 
it is significantly lower with readahead of 512 sectors. 
As I have mentioned, this is the system with only one installed disk; this re-
sults in the write activity on the disk. The two write request per second
go into a different partition and those result in four required seeks per 
second. This should not be a reason for all requests to take about seek time
as service time. 

I have done a number of further test on various systems. In most cases I failed
to achieve service times below 8 msecs in Dom0; the only counterexample is 
reported above. It seems to me, that at low readahead values the amount of
data requested for from disk is simply the readahead amount of data. This 
request takes about seek time and thus I get lower performance when I work
with small readahead values.
What I do not understand at all is why throughput collapses with large 
readahead 
sizes. 

I found in mm/readahead.c that the readahead size for a file is updated if 
the readahead is not efficient. I suspect that the mechanism might lead to 
readahed being switched of for this file.
With readahead being set to 2048 sectors, the product of avgq-sz and avgrq-sz
reported by drops to 4 to 5 physical pages.

Peter 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Re: poor domU VBD performance.
  2005-03-30 17:01 ` peter bier
@ 2005-03-30 18:05   ` Andrew Theurer
  0 siblings, 0 replies; 29+ messages in thread
From: Andrew Theurer @ 2005-03-30 18:05 UTC (permalink / raw)
  To: peter bier; +Cc: xen-devel

peter bier wrote:

>Ian Pratt <m+Ian.Pratt <at> cl.cam.ac.uk> writes:
>
>  
>
>>>I'll check the xen block driver to see if there's anything 
>>>else that sticks out.
>>>
>>>Jens Axboe
>>>      
>>>
>>Jens, I'd really appreciate this.
>>
>>The blkfront/blkback drivers have rather evolved over time, and I don't
>>think any of the core team fully understand the block-layer differences
>>between 2.4 and 2.6. 
>>
>>There's also some junk left in there from when the backend was in Xen
>>itself back in the days of 1.2, though Vincent has prepared a patch to
>>clean this up and also make 'refreshing' of vbd's work (for size
>>changes), and also allow the blkfront driver to import whole disks
>>rather than paritions. We had this functionality on 2.4, but lost it in
>>the move to 2.6.
>>
>>My bet is that it's the 2.6 backend that is where the true perofrmance
>>bug lies. Using a 2.6 domU blkfront talking to a 2.4 dom0 blkback seems
>>to give good performance under a wide variety of circumstances. Using a
>>2.6 dom0 is far more pernickety. I agree with Andrew that I suspect it's
>>the work queue changes are biting us when we don't have many outstanding
>>requests.
>>
>>Thanks,
>>Ian
>>
>>    
>>
>
>
>I have done my simple dd on hde1 with two different setting of readahead:
>256 sectors and 512 sectors.
>
I added a counter and incremented every time blkback daemon was woken up 
and ran the read test in domU.  With 32k and 320k request sizes 
(o_direct), I consistently got 200 wake ups/second.  I expected 
100/second, the same interval as the minimum svc cmt times I am seeing, 
but anyway, 200/sec is way to low for small request sizes.  I think this 
confirms the latency issue.  Not sure yet why it cannot wake up more 
frequently.

-Andrew

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: RE: RE: poor domU VBD performance.
  2005-03-30 11:16 RE: RE: poor domU VBD performance Ian Pratt
  2005-03-30 17:01 ` peter bier
@ 2005-03-31  7:05 ` Jens Axboe
  2005-03-31  7:10   ` Jens Axboe
  1 sibling, 1 reply; 29+ messages in thread
From: Jens Axboe @ 2005-03-31  7:05 UTC (permalink / raw)
  To: Ian Pratt
  Cc: Xen development list, Kurt Garloff, Vincent Hanquez,
	Christian Limpach

On Wed, Mar 30 2005, Ian Pratt wrote:
> > I'll check the xen block driver to see if there's anything 
> > else that sticks out.
> >
> > Jens Axboe
> 
> Jens, I'd really appreciate this.
> 
> The blkfront/blkback drivers have rather evolved over time, and I don't
> think any of the core team fully understand the block-layer differences
> between 2.4 and 2.6. 
> 
> There's also some junk left in there from when the backend was in Xen
> itself back in the days of 1.2, though Vincent has prepared a patch to
> clean this up and also make 'refreshing' of vbd's work (for size
> changes), and also allow the blkfront driver to import whole disks
> rather than paritions. We had this functionality on 2.4, but lost it in
> the move to 2.6.
> 
> My bet is that it's the 2.6 backend that is where the true perofrmance
> bug lies. Using a 2.6 domU blkfront talking to a 2.4 dom0 blkback seems
> to give good performance under a wide variety of circumstances. Using a
> 2.6 dom0 is far more pernickety. I agree with Andrew that I suspect it's
> the work queue changes are biting us when we don't have many outstanding
> requests.

You never schedule the queues you submit the io against for the 2.6
kernel, you only have a tq_disk run for 2.4 kernels. This basically puts
you at the mercy of the timeout unplugging, which is really suboptimal
unless you can keep the io queue of the target busy at all times.

You need to either mark the last bio going to that device as BIO_SYNC,
or do a blk_run_queue() on the target queue after having submitted all
io in this batch for it.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: RE: RE: poor domU VBD performance.
  2005-03-31  7:05 ` RE: " Jens Axboe
@ 2005-03-31  7:10   ` Jens Axboe
  2005-03-31  8:17     ` Keir Fraser
  0 siblings, 1 reply; 29+ messages in thread
From: Jens Axboe @ 2005-03-31  7:10 UTC (permalink / raw)
  To: Ian Pratt
  Cc: Xen development list, Kurt Garloff, Vincent Hanquez,
	Christian Limpach

On Thu, Mar 31 2005, Jens Axboe wrote:
> On Wed, Mar 30 2005, Ian Pratt wrote:
> > > I'll check the xen block driver to see if there's anything 
> > > else that sticks out.
> > >
> > > Jens Axboe
> > 
> > Jens, I'd really appreciate this.
> > 
> > The blkfront/blkback drivers have rather evolved over time, and I don't
> > think any of the core team fully understand the block-layer differences
> > between 2.4 and 2.6. 
> > 
> > There's also some junk left in there from when the backend was in Xen
> > itself back in the days of 1.2, though Vincent has prepared a patch to
> > clean this up and also make 'refreshing' of vbd's work (for size
> > changes), and also allow the blkfront driver to import whole disks
> > rather than paritions. We had this functionality on 2.4, but lost it in
> > the move to 2.6.
> > 
> > My bet is that it's the 2.6 backend that is where the true perofrmance
> > bug lies. Using a 2.6 domU blkfront talking to a 2.4 dom0 blkback seems
> > to give good performance under a wide variety of circumstances. Using a
> > 2.6 dom0 is far more pernickety. I agree with Andrew that I suspect it's
> > the work queue changes are biting us when we don't have many outstanding
> > requests.
> 
> You never schedule the queues you submit the io against for the 2.6
> kernel, you only have a tq_disk run for 2.4 kernels. This basically puts
> you at the mercy of the timeout unplugging, which is really suboptimal
> unless you can keep the io queue of the target busy at all times.
> 
> You need to either mark the last bio going to that device as BIO_SYNC,
> or do a blk_run_queue() on the target queue after having submitted all
> io in this batch for it.

Here is a temporary work-around, this should bring you close to 100%
performance at the cost of some extra unplugs. Uncompiled.

--- blkback.c~	2005-03-31 09:06:16.000000000 +0200
+++ blkback.c	2005-03-31 09:09:27.000000000 +0200
@@ -481,7 +481,6 @@
     for ( i = 0; i < nr_psegs; i++ )
     {
         struct bio *bio;
-        struct bio_vec *bv;
 
         bio = bio_alloc(GFP_ATOMIC, 1);
         if ( unlikely(bio == NULL) )
@@ -494,17 +493,12 @@
         bio->bi_private = pending_req;
         bio->bi_end_io  = end_block_io_op;
         bio->bi_sector  = phys_seg[i].sector_number;
-        bio->bi_rw      = operation;
 
-        bv = bio_iovec_idx(bio, 0);
-        bv->bv_page   = virt_to_page(MMAP_VADDR(pending_idx, i));
-        bv->bv_len    = phys_seg[i].nr_sects << 9;
-        bv->bv_offset = phys_seg[i].buffer & ~PAGE_MASK;
+	bio_add_page(bio, virt_to_page(MMAP_VADDR(pending_idx, i)),
+			phys_seg[i].nr_sects << 9,
+			phys_seg[i].buffer & ~PAGE_MASK);
 
-        bio->bi_size    = bv->bv_len;
-        bio->bi_vcnt++;
-
-        submit_bio(operation, bio);
+        submit_bio(operation | (1 << BIO_RW_SYNC), bio);
     }
 #endif
 

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: poor domU VBD performance.
  2005-03-31  7:10   ` Jens Axboe
@ 2005-03-31  8:17     ` Keir Fraser
  2005-03-31  8:19       ` Jens Axboe
  0 siblings, 1 reply; 29+ messages in thread
From: Keir Fraser @ 2005-03-31  8:17 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Ian Pratt, Xen development list, Kurt Garloff, Vincent Hanquez,
	Christian Limpach

On 31 Mar 2005, at 08:10, Jens Axboe wrote:

> Here is a temporary work-around, this should bring you close to 100%
> performance at the cost of some extra unplugs. Uncompiled.

Yep, this does the job for me. Thanks! Avoiding the extra unplugs is 
harder than it sounds as each request in a batch may go to a different 
request queue. To minimise the number of unplugs per batch we'd need to 
add code to remember which queues we had used in the current batch, 
then kick them at the end of the batch. Is there likely to be any 
measurable benefit from reducing the number of unplugs?

  -- Keir

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: poor domU VBD performance.
  2005-03-31  8:17     ` Keir Fraser
@ 2005-03-31  8:19       ` Jens Axboe
  2005-03-31 14:33         ` Philip R Auld
  0 siblings, 1 reply; 29+ messages in thread
From: Jens Axboe @ 2005-03-31  8:19 UTC (permalink / raw)
  To: Keir Fraser
  Cc: Ian Pratt, Xen development list, Kurt Garloff, Vincent Hanquez,
	Christian Limpach

On Thu, Mar 31 2005, Keir Fraser wrote:
> 
> On 31 Mar 2005, at 08:10, Jens Axboe wrote:
> 
> >Here is a temporary work-around, this should bring you close to 100%
> >performance at the cost of some extra unplugs. Uncompiled.
> 
> Yep, this does the job for me. Thanks! Avoiding the extra unplugs is 
> harder than it sounds as each request in a batch may go to a different 
> request queue. To minimise the number of unplugs per batch we'd need to 
> add code to remember which queues we had used in the current batch, 
> then kick them at the end of the batch. Is there likely to be any 

Or just keep track of the previous queue, if that has changed unplug the
previous queue and update previous queue variable.

> measurable benefit from reducing the number of unplugs?

Probably not, since the plugging happened at the front end as well. So
you should get a nice stream of io in any way.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: poor domU VBD performance.
  2005-03-31  8:19       ` Jens Axboe
@ 2005-03-31 14:33         ` Philip R Auld
  2005-03-31 15:34           ` Kurt Garloff
  0 siblings, 1 reply; 29+ messages in thread
From: Philip R Auld @ 2005-03-31 14:33 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Ian Pratt, Xen development list, Kurt Garloff, Vincent Hanquez,
	Christian Limpach

Rumor has it that on Thu, Mar 31, 2005 at 10:19:01AM +0200 Jens Axboe said:
> On Thu, Mar 31 2005, Keir Fraser wrote:
> 
> > measurable benefit from reducing the number of unplugs?
> 
> Probably not, since the plugging happened at the front end as well. So
> you should get a nice stream of io in any way.

This effects merging though, right? I don't think the the front
end has done any merging. 

Also the BIO_RW_SYNC bit is sometimes ignored in __make_request
due to the bad queue locking interactions with scsi_request_fn.

The bio can be completed before the bio_sync() test in 
__make_request. Since there is no other reference to the bio it 
can be freed and reused by the time it is tested for BIO_RW_SYNC.

Cheers,

Phil


> 
> -- 
> Jens Axboe
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

-- 
Philip R. Auld, Ph.D.  	        	       Egenera, Inc.    
Software Architect                            165 Forest St.
(508) 858-2628                            Marlboro, MA 01752

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: poor domU VBD performance.
  2005-03-31 14:33         ` Philip R Auld
@ 2005-03-31 15:34           ` Kurt Garloff
  2005-03-31 15:39             ` Jens Axboe
  2005-03-31 16:53             ` Philip R Auld
  0 siblings, 2 replies; 29+ messages in thread
From: Kurt Garloff @ 2005-03-31 15:34 UTC (permalink / raw)
  To: Philip R Auld
  Cc: Ian Pratt, Xen development list, Vincent Hanquez, Jens Axboe,
	Christian Limpach


[-- Attachment #1.1: Type: text/plain, Size: 412 bytes --]

Hi,

On Thu, Mar 31, 2005 at 09:33:12AM -0500, Philip R Auld wrote:
> This effects merging though, right? I don't think the the front
> end has done any merging. 

The noop elevator does front and back merging.
My understanding is that it's used in the frontend driver.

Otherwise, unplugging on every block would indeed be quite bad ...

Regards,
-- 
Kurt Garloff, Director SUSE Labs, Novell Inc.

[-- Attachment #1.2: Type: application/pgp-signature, Size: 189 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: poor domU VBD performance.
  2005-03-31 15:34           ` Kurt Garloff
@ 2005-03-31 15:39             ` Jens Axboe
  2005-03-31 15:41               ` Jens Axboe
                                 ` (2 more replies)
  2005-03-31 16:53             ` Philip R Auld
  1 sibling, 3 replies; 29+ messages in thread
From: Jens Axboe @ 2005-03-31 15:39 UTC (permalink / raw)
  To: Kurt Garloff
  Cc: Ian Pratt, Philip R Auld, Xen development list, Vincent Hanquez,
	Christian Limpach

On Thu, Mar 31 2005, Kurt Garloff wrote:
> Hi,
> 
> On Thu, Mar 31, 2005 at 09:33:12AM -0500, Philip R Auld wrote:
> > This effects merging though, right? I don't think the the front
> > end has done any merging. 
> 
> The noop elevator does front and back merging.
> My understanding is that it's used in the frontend driver.
> 
> Otherwise, unplugging on every block would indeed be quite bad ...

Not necessarily - either your io rate is not fast enough to sustain a
substantial queue depth, in that case you get plugging on basically
every io anyways. If on the other hand the io rate is high enough to
maintain a queue depth of > 1, then the plugging will never take place
because the queue never empties.

So all in all, I don't think the temporary work-around will be such a
bad idea. I would still rather implement the queue tracking though, it
should not be more than a few lines of code.

And Philip, I will get the bio_sync() change merged :-)

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: poor domU VBD performance.
  2005-03-31 15:39             ` Jens Axboe
@ 2005-03-31 15:41               ` Jens Axboe
  2005-03-31 16:27                 ` Nivedita Singhvi
  2005-03-31 15:49               ` Keir Fraser
  2005-03-31 16:55               ` Philip R Auld
  2 siblings, 1 reply; 29+ messages in thread
From: Jens Axboe @ 2005-03-31 15:41 UTC (permalink / raw)
  To: Kurt Garloff
  Cc: Ian Pratt, Philip R Auld, Xen development list, Vincent Hanquez,
	Christian Limpach

On Thu, Mar 31 2005, Jens Axboe wrote:
> On Thu, Mar 31 2005, Kurt Garloff wrote:
> > Hi,
> > 
> > On Thu, Mar 31, 2005 at 09:33:12AM -0500, Philip R Auld wrote:
> > > This effects merging though, right? I don't think the the front
> > > end has done any merging. 
> > 
> > The noop elevator does front and back merging.
> > My understanding is that it's used in the frontend driver.
> > 
> > Otherwise, unplugging on every block would indeed be quite bad ...
> 
> Not necessarily - either your io rate is not fast enough to sustain a
> substantial queue depth, in that case you get plugging on basically
> every io anyways. If on the other hand the io rate is high enough to
> maintain a queue depth of > 1, then the plugging will never take place
> because the queue never empties.
> 
> So all in all, I don't think the temporary work-around will be such a
> bad idea. I would still rather implement the queue tracking though, it
> should not be more than a few lines of code.

There are still cases where it will be suboptimal of course, I didn't
intend to claim it will always be as fast as queue tracking! If you are
unlucky enough that the first request will reach the target device and
get started before the next one, you will have a small and a large part
of any given request executed. This isn't good for performance,
naturally. But queueing is so fast, I would be surprised if this
happened much in the real world.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: poor domU VBD performance.
  2005-03-31 15:39             ` Jens Axboe
  2005-03-31 15:41               ` Jens Axboe
@ 2005-03-31 15:49               ` Keir Fraser
  2005-03-31 16:02                 ` Andrew Theurer
  2005-03-31 17:44                 ` Jens Axboe
  2005-03-31 16:55               ` Philip R Auld
  2 siblings, 2 replies; 29+ messages in thread
From: Keir Fraser @ 2005-03-31 15:49 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Ian Pratt, Xen development list, Kurt Garloff, Philip R Auld,
	Vincent Hanquez, Christian Limpach


On 31 Mar 2005, at 16:39, Jens Axboe wrote:

> Not necessarily - either your io rate is not fast enough to sustain a
> substantial queue depth, in that case you get plugging on basically
> every io anyways. If on the other hand the io rate is high enough to
> maintain a queue depth of > 1, then the plugging will never take place
> because the queue never empties.
>
> So all in all, I don't think the temporary work-around will be such a
> bad idea. I would still rather implement the queue tracking though, it
> should not be more than a few lines of code.

I've checked in something along the lines of what you described into 
both the 2.0-testing and the unstable trees. Looks to have identical 
performance to the original simple patch, at least for a bulk 'dd'.

  -- Keir

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: poor domU VBD performance.
  2005-03-31 15:49               ` Keir Fraser
@ 2005-03-31 16:02                 ` Andrew Theurer
  2005-03-31 17:44                 ` Jens Axboe
  1 sibling, 0 replies; 29+ messages in thread
From: Andrew Theurer @ 2005-03-31 16:02 UTC (permalink / raw)
  To: Keir Fraser
  Cc: Ian Pratt, Philip R Auld, Kurt Garloff, Xen development list,
	Vincent Hanquez, Jens Axboe, Christian Limpach

Keir Fraser wrote:

>
> On 31 Mar 2005, at 16:39, Jens Axboe wrote:
>
>> Not necessarily - either your io rate is not fast enough to sustain a
>> substantial queue depth, in that case you get plugging on basically
>> every io anyways. If on the other hand the io rate is high enough to
>> maintain a queue depth of > 1, then the plugging will never take place
>> because the queue never empties.
>>
>> So all in all, I don't think the temporary work-around will be such a
>> bad idea. I would still rather implement the queue tracking though, it
>> should not be more than a few lines of code.
>
>
> I've checked in something along the lines of what you described into 
> both the 2.0-testing and the unstable trees. Looks to have identical 
> performance to the original simple patch, at least for a bulk 'dd'.

I'll do a pull of unstable and see what I get with o_direct, thanks.

-Andrew

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: poor domU VBD performance.
  2005-03-31 15:41               ` Jens Axboe
@ 2005-03-31 16:27                 ` Nivedita Singhvi
  2005-03-31 17:43                   ` Jens Axboe
  2005-03-31 18:27                   ` Kurt Garloff
  0 siblings, 2 replies; 29+ messages in thread
From: Nivedita Singhvi @ 2005-03-31 16:27 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Ian Pratt, Xen development list, Kurt Garloff, Philip R Auld,
	Vincent Hanquez, Christian Limpach

Jens Axboe wrote:

> There are still cases where it will be suboptimal of course, I didn't
> intend to claim it will always be as fast as queue tracking! If you are
> unlucky enough that the first request will reach the target device and
> get started before the next one, you will have a small and a large part
> of any given request executed. This isn't good for performance,
> naturally. But queueing is so fast, I would be surprised if this
> happened much in the real world.

Although the usual answer for what scheduling algorithm is
best is almost always "depends on the workload", it was
suggested to me that the cfq was still the best option to
go with. What do people feel about that? (Or is AS going
to remain default?).

Also, we're making the assumption here that guest OS = virtual
driver/device. I would rather we not make that assumption
always. This may be moot because I was also told there might
be a patch floating around (-mm ?) that allows you to
select scheduling algorithm on a per-device basis. Anyone
know if this is going to come in anytime soon?

thanks,
Nivedita

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: poor domU VBD performance.
  2005-03-31 15:34           ` Kurt Garloff
  2005-03-31 15:39             ` Jens Axboe
@ 2005-03-31 16:53             ` Philip R Auld
  2005-03-31 18:01               ` Jens Axboe
  1 sibling, 1 reply; 29+ messages in thread
From: Philip R Auld @ 2005-03-31 16:53 UTC (permalink / raw)
  To: Kurt Garloff
  Cc: Ian Pratt, Xen development list, Vincent Hanquez, Jens Axboe,
	Christian Limpach

Rumor has it that on Thu, Mar 31, 2005 at 05:34:49PM +0200 Kurt Garloff said:
> Hi,
> 
> On Thu, Mar 31, 2005 at 09:33:12AM -0500, Philip R Auld wrote:
> > This effects merging though, right? I don't think the the front
> > end has done any merging. 
> 
> The noop elevator does front and back merging.
> My understanding is that it's used in the frontend driver.

If that is the case, it can only merge things that are 
machine contiguous. Current guests know this mapping, but 
can they get this when running unmodified with VT-x.

My experience showed very little if any multipage 
IO coming out of the front end.

> 
> Otherwise, unplugging on every block would indeed be quite bad ...

Seems to be somewhat moot anyway given the curent change planned :)

Cheers,

Phil
> 
> Regards,
> -- 
> Kurt Garloff, Director SUSE Labs, Novell Inc.



-- 
Philip R. Auld, Ph.D.  	        	       Egenera, Inc.    
Software Architect                            165 Forest St.
(508) 858-2628                            Marlboro, MA 01752

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: poor domU VBD performance.
  2005-03-31 15:39             ` Jens Axboe
  2005-03-31 15:41               ` Jens Axboe
  2005-03-31 15:49               ` Keir Fraser
@ 2005-03-31 16:55               ` Philip R Auld
  2 siblings, 0 replies; 29+ messages in thread
From: Philip R Auld @ 2005-03-31 16:55 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Ian Pratt, Xen development list, Kurt Garloff, Vincent Hanquez,
	Christian Limpach

Rumor has it that on Thu, Mar 31, 2005 at 05:39:26PM +0200 Jens Axboe said:
> 
> And Philip, I will get the bio_sync() change merged :-)


Thanks! It's good to be transparent ;)



Phil

> 
> -- 
> Jens Axboe

-- 
Philip R. Auld, Ph.D.  	        	       Egenera, Inc.    
Software Architect                            165 Forest St.
(508) 858-2628                            Marlboro, MA 01752

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: poor domU VBD performance.
  2005-03-31 16:27                 ` Nivedita Singhvi
@ 2005-03-31 17:43                   ` Jens Axboe
  2005-03-31 18:27                   ` Kurt Garloff
  1 sibling, 0 replies; 29+ messages in thread
From: Jens Axboe @ 2005-03-31 17:43 UTC (permalink / raw)
  To: Nivedita Singhvi
  Cc: Ian Pratt, Xen development list, Kurt Garloff, Philip R Auld,
	Vincent Hanquez, Christian Limpach

On Thu, Mar 31 2005, Nivedita Singhvi wrote:
> Jens Axboe wrote:
> 
> >There are still cases where it will be suboptimal of course, I didn't
> >intend to claim it will always be as fast as queue tracking! If you are
> >unlucky enough that the first request will reach the target device and
> >get started before the next one, you will have a small and a large part
> >of any given request executed. This isn't good for performance,
> >naturally. But queueing is so fast, I would be surprised if this
> >happened much in the real world.
> 
> Although the usual answer for what scheduling algorithm is
> best is almost always "depends on the workload", it was
> suggested to me that the cfq was still the best option to
> go with. What do people feel about that? (Or is AS going
> to remain default?).

Really the only one that you should not use is AS, anything else will be
fine. AS should only ever be used at the bottom of the stack, if on a
single spindle backing. CFQ will be fine, as will deadline and noop.

> Also, we're making the assumption here that guest OS = virtual
> driver/device. I would rather we not make that assumption
> always. This may be moot because I was also told there might
> be a patch floating around (-mm ?) that allows you to
> select scheduling algorithm on a per-device basis. Anyone
> know if this is going to come in anytime soon?

That patch is in mainline since 2.6.10. You can change schedulers by
echoing the preferred scheduler to /sys/block/<device>/queue/scheduler -
reading that file will show you what schedulers are available.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: poor domU VBD performance.
  2005-03-31 15:49               ` Keir Fraser
  2005-03-31 16:02                 ` Andrew Theurer
@ 2005-03-31 17:44                 ` Jens Axboe
  1 sibling, 0 replies; 29+ messages in thread
From: Jens Axboe @ 2005-03-31 17:44 UTC (permalink / raw)
  To: Keir Fraser
  Cc: Ian Pratt, Xen development list, Kurt Garloff, Philip R Auld,
	Vincent Hanquez, Christian Limpach

On Thu, Mar 31 2005, Keir Fraser wrote:
> 
> On 31 Mar 2005, at 16:39, Jens Axboe wrote:
> 
> >Not necessarily - either your io rate is not fast enough to sustain a
> >substantial queue depth, in that case you get plugging on basically
> >every io anyways. If on the other hand the io rate is high enough to
> >maintain a queue depth of > 1, then the plugging will never take place
> >because the queue never empties.
> >
> >So all in all, I don't think the temporary work-around will be such a
> >bad idea. I would still rather implement the queue tracking though, it
> >should not be more than a few lines of code.
> 
> I've checked in something along the lines of what you described into 
> both the 2.0-testing and the unstable trees. Looks to have identical 
> performance to the original simple patch, at least for a bulk 'dd'.

Can you post the patch here for review? Or just point me somewhere I can
view it.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: poor domU VBD performance.
  2005-03-31 16:53             ` Philip R Auld
@ 2005-03-31 18:01               ` Jens Axboe
  2005-03-31 18:43                 ` Philip R Auld
  0 siblings, 1 reply; 29+ messages in thread
From: Jens Axboe @ 2005-03-31 18:01 UTC (permalink / raw)
  To: Philip R Auld
  Cc: Ian Pratt, Xen development list, Kurt Garloff, Vincent Hanquez,
	Christian Limpach

On Thu, Mar 31 2005, Philip R Auld wrote:
> > > This effects merging though, right? I don't think the the front
> > > end has done any merging. 
> > 
> > The noop elevator does front and back merging.
> > My understanding is that it's used in the frontend driver.
> 
> If that is the case, it can only merge things that are 
> machine contiguous. Current guests know this mapping, but 
> can they get this when running unmodified with VT-x.
> 
> My experience showed very little if any multipage 
> IO coming out of the front end.

There aren't that many users of multipage ios yet. direct io will use
it, ext2 will as well. iirc, -mm has patches for ext3 too. so it's
definitely improving :-)

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: poor domU VBD performance.
  2005-03-31 16:27                 ` Nivedita Singhvi
  2005-03-31 17:43                   ` Jens Axboe
@ 2005-03-31 18:27                   ` Kurt Garloff
  2005-03-31 21:59                     ` Nivedita Singhvi
  1 sibling, 1 reply; 29+ messages in thread
From: Kurt Garloff @ 2005-03-31 18:27 UTC (permalink / raw)
  To: Nivedita Singhvi
  Cc: Ian Pratt, Philip R Auld, Kurt Garloff, Xen development list,
	Vincent Hanquez, Jens Axboe, Christian Limpach


[-- Attachment #1.1: Type: text/plain, Size: 1339 bytes --]

Hi Niv,

On Thu, Mar 31, 2005 at 08:27:30AM -0800, Nivedita Singhvi wrote:
> Although the usual answer for what scheduling algorithm is
> best is almost always "depends on the workload", it was
> suggested to me that the cfq was still the best option to
> go with. What do people feel about that? (Or is AS going
> to remain default?).

This is a different dicussion.
But, yes, I would agree that CFQ (v3) is the best default choice.

Jens, should we maybe make sure that the blockback driver does use 
different (fake) UIDs for the domains that it serves to provide 
the fairness between them. Next step would be to allow to tweak 
IO priorities. Or, to make it more general, add a parameter (call
it uid), that a block driver can pass down to the IO scheduler
and that would normally be current->uid but may be set differently?

> Also, we're making the assumption here that guest OS = virtual
> driver/device. I would rather we not make that assumption
> always. This may be moot because I was also told there might
> be a patch floating around (-mm ?) that allows you to
> select scheduling algorithm on a per-device basis. Anyone

It's part of 2.6.11.
garloff@tpkurt:~ [0]$ cat /sys/block/hda/queue/scheduler
noop anticipatory deadline [cfq]

Regards,
-- 
Kurt Garloff, Director SUSE Labs, Novell Inc.

[-- Attachment #1.2: Type: application/pgp-signature, Size: 189 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: poor domU VBD performance.
  2005-03-31 18:01               ` Jens Axboe
@ 2005-03-31 18:43                 ` Philip R Auld
  2005-03-31 19:07                   ` Keir Fraser
  2005-03-31 19:21                   ` Jens Axboe
  0 siblings, 2 replies; 29+ messages in thread
From: Philip R Auld @ 2005-03-31 18:43 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Ian Pratt, Xen development list, Kurt Garloff, Vincent Hanquez,
	Christian Limpach

Rumor has it that on Thu, Mar 31, 2005 at 08:01:52PM +0200 Jens Axboe said:
> On Thu, Mar 31 2005, Philip R Auld wrote:
> > 
> > My experience showed very little if any multipage 
> > IO coming out of the front end.
> 
> There aren't that many users of multipage ios yet. direct io will use
> it, ext2 will as well. iirc, -mm has patches for ext3 too. so it's
> definitely improving :-)

Sorry, I was being sloppy with terminology :)

What I was getting at was that the backend  will split requests
up and issue each physical segment as a separate bio  (at least in 
the 2.0.5 tree I have in front of me). And that none of these 
physical segments was more that 1 page. 

So the request merging in the back end OS is important, no?


Cheers,

Phil

> 
> -- 
> Jens Axboe

-- 
Philip R. Auld, Ph.D.  	        	       Egenera, Inc.    
Software Architect                            165 Forest St.
(508) 858-2628                            Marlboro, MA 01752

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: poor domU VBD performance.
  2005-03-31 18:43                 ` Philip R Auld
@ 2005-03-31 19:07                   ` Keir Fraser
  2005-03-31 19:10                     ` Keir Fraser
  2005-03-31 19:20                     ` Jens Axboe
  2005-03-31 19:21                   ` Jens Axboe
  1 sibling, 2 replies; 29+ messages in thread
From: Keir Fraser @ 2005-03-31 19:07 UTC (permalink / raw)
  To: Philip R Auld
  Cc: Ian Pratt, Xen development list, Kurt Garloff, Vincent Hanquez,
	Jens Axboe, Christian Limpach

> What I was getting at was that the backend  will split requests
> up and issue each physical segment as a separate bio  (at least in
> the 2.0.5 tree I have in front of me). And that none of these
> physical segments was more that 1 page.
>
> So the request merging in the back end OS is important, no?

Ah, this reminds me I have one more question for Jens.

Since all the bio's that I queue up in a single invocation of 
dispatch_rw_block_io() will actually be adjacent to each other (because 
they're all from the same scatter-gather list) can I actually do 
something like (very roughly):

bio = bio_alloc(GFP_KERNEL, nr_psegs);
for ( i = 0; i < nr_psegs; i++ )
    bio_add_page(bio, blah...);
submit_bio(operation, bio);

Each of the biovecs that I queue may not be a full page in size (but 
won't straddle a page boundary of course).

This would avoid the bio's having to be merged again later.

  -- Keir

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: poor domU VBD performance.
  2005-03-31 19:07                   ` Keir Fraser
@ 2005-03-31 19:10                     ` Keir Fraser
  2005-03-31 19:20                     ` Jens Axboe
  1 sibling, 0 replies; 29+ messages in thread
From: Keir Fraser @ 2005-03-31 19:10 UTC (permalink / raw)
  To: Keir Fraser
  Cc: Ian Pratt, Xen development list, Kurt Garloff, Philip R Auld,
	Vincent Hanquez, Jens Axboe, Christian Limpach


On 31 Mar 2005, at 20:07, Keir Fraser wrote:

> Since all the bio's that I queue up in a single invocation of 
> dispatch_rw_block_io() will actually be adjacent to each other 
> (because they're all from the same scatter-gather list)

I should add: I know that the code makes it look like each s-g element 
might map somewhere entirely different from the previous one, but we no 
longer support that mode of operation. Each VBD now always maps onto a 
single, entire block device or partition.

  -- Keir

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: poor domU VBD performance.
  2005-03-31 19:07                   ` Keir Fraser
  2005-03-31 19:10                     ` Keir Fraser
@ 2005-03-31 19:20                     ` Jens Axboe
  1 sibling, 0 replies; 29+ messages in thread
From: Jens Axboe @ 2005-03-31 19:20 UTC (permalink / raw)
  To: Keir Fraser
  Cc: Ian Pratt, Philip R Auld, Kurt Garloff, Xen development list,
	Vincent Hanquez, Christian Limpach

On Thu, Mar 31 2005, Keir Fraser wrote:
> >What I was getting at was that the backend  will split requests
> >up and issue each physical segment as a separate bio  (at least in
> >the 2.0.5 tree I have in front of me). And that none of these
> >physical segments was more that 1 page.
> >
> >So the request merging in the back end OS is important, no?
> 
> Ah, this reminds me I have one more question for Jens.
> 
> Since all the bio's that I queue up in a single invocation of 
> dispatch_rw_block_io() will actually be adjacent to each other (because 
> they're all from the same scatter-gather list) can I actually do 
> something like (very roughly):
> 
> bio = bio_alloc(GFP_KERNEL, nr_psegs);
> for ( i = 0; i < nr_psegs; i++ )
>    bio_add_page(bio, blah...);
> submit_bio(operation, bio);
> 
> Each of the biovecs that I queue may not be a full page in size (but 
> won't straddle a page boundary of course).

Yes, this is precisely what you should do, the current method is pretty
suboptimal. Basically allocate a bio with nr_psegs, and call
bio_add_page() for each page until it returns _less_ than the number of
bytes you requested. When it does that, submit that bio for io and
allocate a new bio with nr_psegs-submitted_segs bio_vecs attached.
Continue until you are done.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: poor domU VBD performance.
  2005-03-31 18:43                 ` Philip R Auld
  2005-03-31 19:07                   ` Keir Fraser
@ 2005-03-31 19:21                   ` Jens Axboe
  1 sibling, 0 replies; 29+ messages in thread
From: Jens Axboe @ 2005-03-31 19:21 UTC (permalink / raw)
  To: Philip R Auld
  Cc: Ian Pratt, Xen development list, Kurt Garloff, Vincent Hanquez,
	Christian Limpach

On Thu, Mar 31 2005, Philip R Auld wrote:
> Rumor has it that on Thu, Mar 31, 2005 at 08:01:52PM +0200 Jens Axboe said:
> > On Thu, Mar 31 2005, Philip R Auld wrote:
> > > 
> > > My experience showed very little if any multipage 
> > > IO coming out of the front end.
> > 
> > There aren't that many users of multipage ios yet. direct io will use
> > it, ext2 will as well. iirc, -mm has patches for ext3 too. so it's
> > definitely improving :-)
> 
> Sorry, I was being sloppy with terminology :)
> 
> What I was getting at was that the backend  will split requests
> up and issue each physical segment as a separate bio  (at least in 
> the 2.0.5 tree I have in front of me). And that none of these 
> physical segments was more that 1 page. 
> 
> So the request merging in the back end OS is important, no?

I suppose it always is, since the merge criteria may have changed from
when the io was initially queued. If requests are always split into
single pages, then it becomes very important to merge at the backend.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: poor domU VBD performance.
  2005-03-31 18:27                   ` Kurt Garloff
@ 2005-03-31 21:59                     ` Nivedita Singhvi
  0 siblings, 0 replies; 29+ messages in thread
From: Nivedita Singhvi @ 2005-03-31 21:59 UTC (permalink / raw)
  To: Kurt Garloff
  Cc: Ian Pratt, Xen development list, Philip R Auld, Vincent Hanquez,
	Jens Axboe, Christian Limpach

Kurt Garloff wrote:

> Hi Niv,
> 
> On Thu, Mar 31, 2005 at 08:27:30AM -0800, Nivedita Singhvi wrote:
> 
>>Although the usual answer for what scheduling algorithm is
>>best is almost always "depends on the workload", it was
>>suggested to me that the cfq was still the best option to
>>go with. What do people feel about that? (Or is AS going
>>to remain default?).
> 
> 
> This is a different dicussion.

Yes, I did change the subject a little ;).

> But, yes, I would agree that CFQ (v3) is the best default choice.

Yep, even though some of the complications in the Xen
environment (as you point out below) will have to be addressed.

> Jens, should we maybe make sure that the blockback driver does use 
> different (fake) UIDs for the domains that it serves to provide 
> the fairness between them. Next step would be to allow to tweak 
> IO priorities. Or, to make it more general, add a parameter (call
> it uid), that a block driver can pass down to the IO scheduler
> and that would normally be current->uid but may be set differently?


> It's part of 2.6.11.
> garloff@tpkurt:~ [0]$ cat /sys/block/hda/queue/scheduler
> noop anticipatory deadline [cfq]

I just saw Jens' reply as well. This is much goodness :).
Very handy indeed!

thanks,
Nivedita

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2005-03-31 21:59 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-03-30 11:16 RE: RE: poor domU VBD performance Ian Pratt
2005-03-30 17:01 ` peter bier
2005-03-30 18:05   ` Andrew Theurer
2005-03-31  7:05 ` RE: " Jens Axboe
2005-03-31  7:10   ` Jens Axboe
2005-03-31  8:17     ` Keir Fraser
2005-03-31  8:19       ` Jens Axboe
2005-03-31 14:33         ` Philip R Auld
2005-03-31 15:34           ` Kurt Garloff
2005-03-31 15:39             ` Jens Axboe
2005-03-31 15:41               ` Jens Axboe
2005-03-31 16:27                 ` Nivedita Singhvi
2005-03-31 17:43                   ` Jens Axboe
2005-03-31 18:27                   ` Kurt Garloff
2005-03-31 21:59                     ` Nivedita Singhvi
2005-03-31 15:49               ` Keir Fraser
2005-03-31 16:02                 ` Andrew Theurer
2005-03-31 17:44                 ` Jens Axboe
2005-03-31 16:55               ` Philip R Auld
2005-03-31 16:53             ` Philip R Auld
2005-03-31 18:01               ` Jens Axboe
2005-03-31 18:43                 ` Philip R Auld
2005-03-31 19:07                   ` Keir Fraser
2005-03-31 19:10                     ` Keir Fraser
2005-03-31 19:20                     ` Jens Axboe
2005-03-31 19:21                   ` Jens Axboe
     [not found] <A95E2296287EAD4EB592B5DEEFCE0E9D1E3905@liverpoolst.ad.cl.cam.ac.uk>
2005-03-29 22:45 ` RE: " Kurt Garloff
2005-03-30  8:53   ` Jens Axboe
2005-03-30 10:00   ` Kurt Garloff

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.