public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [BENCHMARK] max bomb segment tuning with read latency 2 patch in contest
@ 2002-12-07  5:20 Con Kolivas
  2002-12-07  5:55 ` Andrew Morton
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Con Kolivas @ 2002-12-07  5:20 UTC (permalink / raw)
  To: linux kernel mailing list; +Cc: Andrew Morton

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Here are some io_load contest benchmarks with 2.4.20 with the read latency2 
patch applied and varying the max bomb segments from 1-6 (SMP used to save 
time!)

io_load:
Kernel [runs]           Time    CPU%    Loads   LCPU%   Ratio
2.4.20 [5]              164.9   45      31      21      4.55
2420rl2b1 [5]           93.5    81      18      22      2.58
2420rl2b2 [5]           88.2    87      16      22      2.44
2420rl2b4 [5]           87.8    84      17      22      2.42
2420rl2b6 [5]           100.3   77      19      22      2.77

io_other:
Kernel [runs]           Time    CPU%    Loads   LCPU%   Ratio
2.4.20 [5]              89.6    86      17      21      2.47
2420rl2b1 [3]           48.1    156     9       21      1.33
2420rl2b2 [3]           50.0    149     9       21      1.38
2420rl2b4 [5]           51.9    141     10      21      1.43
2420rl2b6 [5]           52.1    142     9       20      1.44

There seems to be a limit to the benefit of decreasing max bomb segments. It 
does not seem to have a significant effect on io load on another hard disk 
(although read latency2 is overall much better than vanilla).

Con
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.0 (GNU/Linux)

iD8DBQE98YUEF6dfvkL3i1gRAn4kAJ4x414sM3G+8fVrXv2P0huRhNKicgCgqFyo
kCXIKMVtO/Zp+tM92qlUz4s=
=HOKs
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [BENCHMARK] max bomb segment tuning with read latency 2 patch in  contest
  2002-12-07  5:20 [BENCHMARK] max bomb segment tuning with read latency 2 patch in contest Con Kolivas
@ 2002-12-07  5:55 ` Andrew Morton
  2002-12-07  6:09   ` Con Kolivas
                     ` (2 more replies)
  2002-12-07 13:29 ` [BENCHMARK] max bomb segment tuning with read latency 2 patch in contest Con Kolivas
  2002-12-10 10:50 ` Miquel van Smoorenburg
  2 siblings, 3 replies; 10+ messages in thread
From: Andrew Morton @ 2002-12-07  5:55 UTC (permalink / raw)
  To: Con Kolivas; +Cc: linux kernel mailing list

Con Kolivas wrote:
> 
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Here are some io_load contest benchmarks with 2.4.20 with the read latency2
> patch applied and varying the max bomb segments from 1-6 (SMP used to save
> time!)
> 
> io_load:
> Kernel [runs]           Time    CPU%    Loads   LCPU%   Ratio
> 2.4.20 [5]              164.9   45      31      21      4.55
> 2420rl2b1 [5]           93.5    81      18      22      2.58
> 2420rl2b2 [5]           88.2    87      16      22      2.44
> 2420rl2b4 [5]           87.8    84      17      22      2.42
> 2420rl2b6 [5]           100.3   77      19      22      2.77

If the SMP machine is using scsi then that tends to make the elevator
changes less effective.  Because the disk sort-of has its own internal
elevator which in my testing on a Fujitsu disk has the same ill-advised
design as the kernel's elevator: it treats reads and writes in a similar
manner.

Setting the tag depth to zero helps heaps.

But as you're interested in `desktop responsiveness' you should be
mostly testing against IDE disks.  Their behavour tends to be quite
different.

If you can turn on write caching on the SCSI disks that would change
the picture too.

> io_other:
> Kernel [runs]           Time    CPU%    Loads   LCPU%   Ratio
> 2.4.20 [5]              89.6    86      17      21      2.47
> 2420rl2b1 [3]           48.1    156     9       21      1.33
> 2420rl2b2 [3]           50.0    149     9       21      1.38
> 2420rl2b4 [5]           51.9    141     10      21      1.43
> 2420rl2b6 [5]           52.1    142     9       20      1.44
> 
> There seems to be a limit to the benefit of decreasing max bomb segments. It
> does not seem to have a significant effect on io load on another hard disk
> (although read latency2 is overall much better than vanilla).

hm.  I'm rather surprised it made much difference at all to io_other,
because you shouldn't have competing reads and writes against either
disk??

The problem with io_other should be tickling is where `gcc' tries to
allocate a page but ends up having to write out someone else's data,
and gets stuck sleeping on the disk queue due to the activity of
other processes.  (This doesn't happen much on a 4G machine, but it'll
happen a lot on a 256M machine).

But that's a write-latency problem, not a read-latency one.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [BENCHMARK] max bomb segment tuning with read latency 2 patch in  contest
  2002-12-07  5:55 ` Andrew Morton
@ 2002-12-07  6:09   ` Con Kolivas
  2002-12-07  6:14     ` Andrew Morton
  2002-12-07  6:15   ` GrandMasterLee
  2002-12-07  6:20   ` GrandMasterLee
  2 siblings, 1 reply; 10+ messages in thread
From: Con Kolivas @ 2002-12-07  6:09 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux kernel mailing list

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


>Con Kolivas wrote:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> Here are some io_load contest benchmarks with 2.4.20 with the read
>> latency2 patch applied and varying the max bomb segments from 1-6 (SMP
>> used to save time!)
>>
>> io_load:
>> Kernel [runs]           Time    CPU%    Loads   LCPU%   Ratio
>> 2.4.20 [5]              164.9   45      31      21      4.55
>> 2420rl2b1 [5]           93.5    81      18      22      2.58
>> 2420rl2b2 [5]           88.2    87      16      22      2.44
>> 2420rl2b4 [5]           87.8    84      17      22      2.42
>> 2420rl2b6 [5]           100.3   77      19      22      2.77
>
>If the SMP machine is using scsi then that tends to make the elevator
>changes less effective.  Because the disk sort-of has its own internal
>elevator which in my testing on a Fujitsu disk has the same ill-advised
>design as the kernel's elevator: it treats reads and writes in a similar
>manner.

These are ide disks, in the same format as those used in the UP machine, so it 
still should be showing the same effect? I think higher numbers in UP would 
increase the resolution more for these results - apart from that is there any 
disadvantage to doing it in SMP? If you think it's worth running them in UP 
mode I'll do that.

>
>Setting the tag depth to zero helps heaps.
>
>But as you're interested in `desktop responsiveness' you should be
>mostly testing against IDE disks.  Their behavour tends to be quite
>different.
>
>If you can turn on write caching on the SCSI disks that would change
>the picture too.
>
>> io_other:
>> Kernel [runs]           Time    CPU%    Loads   LCPU%   Ratio
>> 2.4.20 [5]              89.6    86      17      21      2.47
>> 2420rl2b1 [3]           48.1    156     9       21      1.33
>> 2420rl2b2 [3]           50.0    149     9       21      1.38
>> 2420rl2b4 [5]           51.9    141     10      21      1.43
>> 2420rl2b6 [5]           52.1    142     9       20      1.44
>>
>> There seems to be a limit to the benefit of decreasing max bomb segments.
>> It does not seem to have a significant effect on io load on another hard
>> disk (although read latency2 is overall much better than vanilla).
>
>hm.  I'm rather surprised it made much difference at all to io_other,
>because you shouldn't have competing reads and writes against either
>disk??

Some of the partitions are mounted on that other disk as well so occasionally 
it is involved in the kernel compile.

/dev/hda8 on / type ext3 (rw)
none on /proc type proc (rw)
/dev/hda1 on /boot type ext3 (rw)
none on /dev/pts type devpts (rw,mode=0620)
/dev/hda7 on /home type ext3 (rw)
/dev/hda5 on /tmp type ext3 (rw)
/dev/hdb5 on /usr type ext3 (rw)
/dev/hdb1 on /var type ext3 (rw)

The testing is done from /dev/hda7 and io_load writes to /dev/hda7, io_other 
writes to /dev/hdb1

Unfortunately this is the way the osdl machine was set up for me. I should 
have been more specific in my requests but I didnt realise they were doing 
this. There isn't really that much spare space on the two drives to shuffle 
the partitioning around and contest can use huge amounts of space during 
testing :\

>The problem with io_other should be tickling is where `gcc' tries to
>allocate a page but ends up having to write out someone else's data,
>and gets stuck sleeping on the disk queue due to the activity of
>other processes.  (This doesn't happen much on a 4G machine, but it'll
>happen a lot on a 256M machine).
>
>But that's a write-latency problem, not a read-latency one.

Con
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.0 (GNU/Linux)

iD8DBQE98ZCsF6dfvkL3i1gRAtAJAKCipF5dOAp2g+ICRuV4xagT/qsvZgCfWhaN
eZsoUGwt5RjlGbZJiD+nYZI=
=OVHE
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [BENCHMARK] max bomb segment tuning with read latency 2 patch in   contest
  2002-12-07  6:09   ` Con Kolivas
@ 2002-12-07  6:14     ` Andrew Morton
  0 siblings, 0 replies; 10+ messages in thread
From: Andrew Morton @ 2002-12-07  6:14 UTC (permalink / raw)
  To: Con Kolivas; +Cc: linux kernel mailing list

Con Kolivas wrote:
> 
> ...
> >If the SMP machine is using scsi then that tends to make the elevator
> >changes less effective.  Because the disk sort-of has its own internal
> >elevator which in my testing on a Fujitsu disk has the same ill-advised
> >design as the kernel's elevator: it treats reads and writes in a similar
> >manner.
> 
> These are ide disks, in the same format as those used in the UP machine, so it
> still should be showing the same effect? I think higher numbers in UP would
> increase the resolution more for these results - apart from that is there any
> disadvantage to doing it in SMP? If you think it's worth running them in UP
> mode I'll do that.

Oh, OK.  I was guessing, and guessed wrong.  No, I don't expect you'd
see much difference switching to UP for those tests which are sensitive
to the IO scheduler policy.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [BENCHMARK] max bomb segment tuning with read latency 2 patch in  contest
  2002-12-07  5:55 ` Andrew Morton
  2002-12-07  6:09   ` Con Kolivas
@ 2002-12-07  6:15   ` GrandMasterLee
  2002-12-07  6:20   ` GrandMasterLee
  2 siblings, 0 replies; 10+ messages in thread
From: GrandMasterLee @ 2002-12-07  6:15 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Con Kolivas, linux kernel mailing list

On Fri, 2002-12-06 at 23:55, Andrew Morton wrote:
[...]
> If the SMP machine is using scsi then that tends to make the elevator
> changes less effective.  Because the disk sort-of has its own internal
> elevator which in my testing on a Fujitsu disk has the same ill-advised
> design as the kernel's elevator: it treats reads and writes in a similar
> manner.
> 
> Setting the tag depth to zero helps heaps.

Command tag queue? As in the compile time option? Or do you mean queue
depth?(or are they the same)

> But as you're interested in `desktop responsiveness' you should be
> mostly testing against IDE disks.  Their behavour tends to be quite
> different.
> 
> If you can turn on write caching on the SCSI disks that would change
> the picture too.

Just for clarity, What about for something like FC attached storage
Where the controllers enforce cache policies on a "per volume" basis?
Would that == the same thing? 



--The GrandMaster

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [BENCHMARK] max bomb segment tuning with read latency 2 patch in  contest
  2002-12-07  5:55 ` Andrew Morton
  2002-12-07  6:09   ` Con Kolivas
  2002-12-07  6:15   ` GrandMasterLee
@ 2002-12-07  6:20   ` GrandMasterLee
  2002-12-07  6:45     ` [BENCHMARK] max bomb segment tuning with read latency 2 patchin contest Andrew Morton
  2 siblings, 1 reply; 10+ messages in thread
From: GrandMasterLee @ 2002-12-07  6:20 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Con Kolivas, linux kernel mailing list

On Fri, 2002-12-06 at 23:55, Andrew Morton wrote:
[...]
> If the SMP machine is using scsi then that tends to make the elevator
> changes less effective.  Because the disk sort-of has its own internal
> elevator which in my testing on a Fujitsu disk has the same ill-advised
> design as the kernel's elevator: it treats reads and writes in a similar
> manner.
> 
> Setting the tag depth to zero helps heaps.
> 
> But as you're interested in `desktop responsiveness' you should be
> mostly testing against IDE disks.  Their behavour tends to be quite
> different.


One interesting thing about my current setup, with all scsi or FC disks,
is that bomb never displays > 0. 
Example: 

elvtune /dev/sdn yields:

/dev/sdn elevator ID            17
        read_latency:           8192
        write_latency:          16384
        max_bomb_segments:      0

elvtune -b 6 /dev/sdn yields:

/dev/sdn elevator ID            17
        read_latency:           8192
        write_latency:          16384
        max_bomb_segments:      0

Is it because I just do volume management at the hardware level and use
whole disks? Or is that something else?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [BENCHMARK] max bomb segment tuning with read latency 2 patchin   contest
  2002-12-07  6:20   ` GrandMasterLee
@ 2002-12-07  6:45     ` Andrew Morton
  0 siblings, 0 replies; 10+ messages in thread
From: Andrew Morton @ 2002-12-07  6:45 UTC (permalink / raw)
  To: GrandMasterLee; +Cc: Con Kolivas, linux kernel mailing list

GrandMasterLee wrote:
> 
> ...
> One interesting thing about my current setup, with all scsi or FC disks,
> is that bomb never displays > 0.
> Example:
> 
> elvtune /dev/sdn yields:
> 
> /dev/sdn elevator ID            17
>         read_latency:           8192
>         write_latency:          16384
>         max_bomb_segments:      0
> 
> elvtune -b 6 /dev/sdn yields:
> 
> /dev/sdn elevator ID            17
>         read_latency:           8192
>         write_latency:          16384
>         max_bomb_segments:      0
> 
> Is it because I just do volume management at the hardware level and use
> whole disks? Or is that something else?

You need a patched kernel.  max_bomb_segments is some old thing
which isn't implemented any more.  But I reused it for something
completely different in the patch which Con is testing.  So I
wouldn't have to futz around with patching userspace apps.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [BENCHMARK] max bomb segment tuning with read latency 2 patch in contest
  2002-12-07  5:20 [BENCHMARK] max bomb segment tuning with read latency 2 patch in contest Con Kolivas
  2002-12-07  5:55 ` Andrew Morton
@ 2002-12-07 13:29 ` Con Kolivas
  2002-12-10 10:50 ` Miquel van Smoorenburg
  2 siblings, 0 replies; 10+ messages in thread
From: Con Kolivas @ 2002-12-07 13:29 UTC (permalink / raw)
  To: linux kernel mailing list; +Cc: Andrew Morton

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


>Here are some io_load contest benchmarks with 2.4.20 with the read latency2
>patch applied and varying the max bomb segments from 1-6 (SMP used to save
>time!)
>
>io_load:
>Kernel [runs]           Time    CPU%    Loads   LCPU%   Ratio
>2.4.20 [5]              164.9   45      31      21      4.55
>2420rl2b1 [5]           93.5    81      18      22      2.58
>2420rl2b2 [5]           88.2    87      16      22      2.44
>2420rl2b4 [5]           87.8    84      17      22      2.42
>2420rl2b6 [5]           100.3   77      19      22      2.77
>
>io_other:
>Kernel [runs]           Time    CPU%    Loads   LCPU%   Ratio
>2.4.20 [5]              89.6    86      17      21      2.47
>2420rl2b1 [3]           48.1    156     9       21      1.33
>2420rl2b2 [3]           50.0    149     9       21      1.38
>2420rl2b4 [5]           51.9    141     10      21      1.43
>2420rl2b6 [5]           52.1    142     9       20      1.44
>
>There seems to be a limit to the benefit of decreasing max bomb segments. It
>does not seem to have a significant effect on io load on another hard disk
>(although read latency2 is overall much better than vanilla).
>
>Con

Further testing with changing values of read and write latencies (with fixed 
max_bomb to 4) and the read latency 2 patch in place shows no significant 
change to these figures over a wide range of numbers. This was not the case 
when I ran contest with different read latency values on the vanilla kernel 
(and found -r 512 to be a reasonable compromise according to Jens). Is there 
some other advantage to be gained by say increasing these numbers? (since 
contest results don't change with higher numbers either)

Con
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.0 (GNU/Linux)

iD8DBQE98ffKF6dfvkL3i1gRAo01AJ0Zvs0x80vGF1hUillnIL4y+f6xRQCfZyni
YkNWPMORdfjRHfG5/6NxV4M=
=g1ht
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [BENCHMARK] max bomb segment tuning with read latency 2 patch in contest
  2002-12-07  5:20 [BENCHMARK] max bomb segment tuning with read latency 2 patch in contest Con Kolivas
  2002-12-07  5:55 ` Andrew Morton
  2002-12-07 13:29 ` [BENCHMARK] max bomb segment tuning with read latency 2 patch in contest Con Kolivas
@ 2002-12-10 10:50 ` Miquel van Smoorenburg
  2002-12-10 10:55   ` Marc-Christian Petersen
  2 siblings, 1 reply; 10+ messages in thread
From: Miquel van Smoorenburg @ 2002-12-10 10:50 UTC (permalink / raw)
  To: linux-kernel

In article <200212071620.05503.conman@kolivas.net>,
Con Kolivas  <conman@kolivas.net> wrote:
>Here are some io_load contest benchmarks with 2.4.20 with the read latency2 
>patch applied

Where is the rl2 patch for 2.4.20-vanilla ?

Mike.
-- 
They all laughed when I said I wanted to build a joke-telling machine.
Well, I showed them! Nobody's laughing *now*! -- acesteves@clix.pt


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [BENCHMARK] max bomb segment tuning with read latency 2 patch in contest
  2002-12-10 10:50 ` Miquel van Smoorenburg
@ 2002-12-10 10:55   ` Marc-Christian Petersen
  0 siblings, 0 replies; 10+ messages in thread
From: Marc-Christian Petersen @ 2002-12-10 10:55 UTC (permalink / raw)
  To: linux-kernel; +Cc: Miquel van Smoorenburg

[-- Attachment #1: Type: text/plain, Size: 247 bytes --]

On Tuesday 10 December 2002 11:50, Miquel van Smoorenburg wrote:

Hi Miquel,

> >Here are some io_load contest benchmarks with 2.4.20 with the read
> > latency2 patch applied
> Where is the rl2 patch for 2.4.20-vanilla ?
here.

ciao, Marc

[-- Attachment #2: read-latency2-2.4.20-vanilla.patch --]
[-- Type: text/x-diff, Size: 11707 bytes --]

This patch is designed to improves disk read latencies in the presence of
heavy write traffic.  I'm proposing it for inclusion in 2.4.x.

It changes the disk elevator's treatment of read requests.  Instead of
placing an unmergeable read at the tail of the list, it is placed a tunable
distance from the front.  That distance is tuned with `elvtune -b N'.

After much testing, the default value of N (aka max_bomb_segments) is 6.

Increasing max_bomb_segments penalises reads (a lot) and benefits
writes (a little).

Setting max_bomb_segments to zero disables the feature.

There are two other changes here:

1: Currently, if a request's latency in the queue is expired, it becomes a
   barrier to all newly introduced sectors.  With this patch, it becomes a
   barrier only to the introduction of *new* requests in the queue. 
   Contiguous merges can still bypass an expired request.

   We still avoid the `infinite latency' problem because when all the
   requests in front of the expired one are at max_sectors, that's it.  No
   more requests can be introduced in front of the expired one.

   This change gives improved merging and is worth 10-15% on dbench.

2: The request queues are big again.  A minimum of 32 requests and a
   maximum of 1024.  The maximum is reached on machines which have 512
   megabytes or more.

   Rationale: request merging/sorting is the *only* means we have of
   straightening out unordered requests from the application layer.  

   There are some workloads where this simply collapses.  The `random
   write' tests in iozone and in Juan's misc001 result in the machine being
   locked up for minutes, trickling stuff out to disk at 500 k/sec. 
   Increasing the request queue size helps here.  A bit.

   I believe the current 128-request limit was introduced in a (not very
   successful) attempt to reduce read latencies.  Well, we don't need to do
   that now.  (-ac kernels still have >1000 requests per queue).

   It's worth another 10% on dbench.


One of the objectives here was to ensure that the tunable actually does
something useful.  That it gives a good spectrum of control over the
write-throughput-versus-read-latency balance.  It does that.

I'll spare you all the columns of testing numbers.  Here's a summary of
the performance changes at the default elevator settings:

- Linear read throughput in the presence of a linear write is improved
  by 8x to 10x

- Linear read throughput in the presence of seek-intensive writes (yup,
  dbench) is improved by 5x to 30x

- Many-file read throughput (reading a kernel tree) in the presence
  of a streaming write is increased by 2x to 30x

- dbench throughput is increased a little.

- the results vary greatly depending upon available memory.  Generally
  but not always, small-memory machines suffer latency more, and are
  benefitted more.

- other benchmarks (iozone, bonnie++, tiobench) are unaltered - they
  all tend to just do single large writes.

On the downside:

- linear write throughput in the presence of a large streaming read
  is halved.

- linear write throughput in the presence of ten seek-intensive
  reading processes (read 10 separate kernel trees in parallel) is 7x
  lower.

- linear write throughput in the presence of one seek-intensive
  reading process (kernel tree diff) is about 15% lower.


One thing which probably needs altering now is the default settings of
the elevator read and write latencies.  It should be possible to
increase these significantly and get more throughput improvements. 
That's on my todo list.

Increasing the VM readahead parameters will probably be an overall win.

This is a pretty fundamental change to the kernel.  Please test this
patch.  Not only for its goodness - it has tons of that.  Try also to
find badness.



 drivers/block/elevator.c  |   85 +++++++++++++++++++++++++++++++++++++++++-----
 drivers/block/ll_rw_blk.c |    8 ++--
 include/linux/elevator.h  |   43 ++++++-----------------
 3 files changed, 93 insertions(+), 43 deletions(-)

--- linux-akpm/drivers/block/elevator.c~read-latency2	Sun Nov 10 19:53:53 2002
+++ linux-akpm-akpm/drivers/block/elevator.c	Sun Nov 10 19:59:21 2002
@@ -80,25 +80,38 @@ int elevator_linus_merge(request_queue_t
 			 struct buffer_head *bh, int rw,
 			 int max_sectors)
 {
-	struct list_head *entry = &q->queue_head;
-	unsigned int count = bh->b_size >> 9, ret = ELEVATOR_NO_MERGE;
+	struct list_head *entry;
+	unsigned int count = bh->b_size >> 9;
+	unsigned int ret = ELEVATOR_NO_MERGE;
+	int merge_only = 0;
+	const int max_bomb_segments = q->elevator.max_bomb_segments;
 	struct request *__rq;
+	int passed_a_read = 0;
+
+	entry = &q->queue_head;
 
 	while ((entry = entry->prev) != head) {
 		__rq = blkdev_entry_to_request(entry);
 
-		/*
-		 * we can't insert beyond a zero sequence point
-		 */
-		if (__rq->elevator_sequence <= 0)
-			break;
+		if (__rq->elevator_sequence-- <= 0) {
+			/*
+			 * OK, we've exceeded someone's latency limit.
+			 * But we still continue to look for merges,
+			 * because they're so much better than seeks.
+			 */
+			merge_only = 1;
+		}
 
 		if (__rq->waiting)
 			continue;
 		if (__rq->rq_dev != bh->b_rdev)
 			continue;
-		if (!*req && bh_rq_in_between(bh, __rq, &q->queue_head))
+		if (!*req && !merge_only &&
+				bh_rq_in_between(bh, __rq, &q->queue_head)) {
 			*req = __rq;
+		}
+		if (__rq->cmd != WRITE)
+			passed_a_read = 1;
 		if (__rq->cmd != rw)
 			continue;
 		if (__rq->nr_sectors + count > max_sectors)
@@ -129,6 +142,57 @@ int elevator_linus_merge(request_queue_t
 		}
 	}
 
+	/*
+	 * If we failed to merge a read anywhere in the request
+	 * queue, we really don't want to place it at the end
+	 * of the list, behind lots of writes.  So place it near
+	 * the front.
+	 *
+	 * We don't want to place it in front of _all_ writes: that
+	 * would create lots of seeking, and isn't tunable.
+	 * We try to avoid promoting this read in front of existing
+	 * reads.
+	 *
+	 * max_bomb_segments becomes the maximum number of write
+	 * requests which we allow to remain in place in front of
+	 * a newly introduced read.  We weight things a little bit,
+	 * so large writes are more expensive than small ones, but it's
+	 * requests which count, not sectors.
+	 */
+	if (max_bomb_segments && rw == READ && !passed_a_read &&
+				ret == ELEVATOR_NO_MERGE) {
+		int cur_latency = 0;
+		struct request * const cur_request = *req;
+
+		entry = head->next;
+		while (entry != &q->queue_head) {
+			struct request *__rq;
+
+			if (entry == &q->queue_head)
+				BUG();
+			if (entry == q->queue_head.next &&
+					q->head_active && !q->plugged)
+				BUG();
+			__rq = blkdev_entry_to_request(entry);
+
+			if (__rq == cur_request) {
+				/*
+				 * This is where the old algorithm placed it.
+				 * There's no point pushing it further back,
+				 * so leave it here, in sorted order.
+				 */
+				break;
+			}
+			if (__rq->cmd == WRITE) {
+				cur_latency += 1 + __rq->nr_sectors / 64;
+				if (cur_latency >= max_bomb_segments) {
+					*req = __rq;
+					break;
+				}
+			}
+			entry = entry->next;
+		}
+	}
 	return ret;
 }
 
@@ -186,7 +250,7 @@ int blkelvget_ioctl(elevator_t * elevato
 	output.queue_ID			= elevator->queue_ID;
 	output.read_latency		= elevator->read_latency;
 	output.write_latency		= elevator->write_latency;
-	output.max_bomb_segments	= 0;
+	output.max_bomb_segments	= elevator->max_bomb_segments;
 
 	if (copy_to_user(arg, &output, sizeof(blkelv_ioctl_arg_t)))
 		return -EFAULT;
@@ -205,9 +269,12 @@ int blkelvset_ioctl(elevator_t * elevato
 		return -EINVAL;
 	if (input.write_latency < 0)
 		return -EINVAL;
+	if (input.max_bomb_segments < 0)
+		return -EINVAL;
 
 	elevator->read_latency		= input.read_latency;
 	elevator->write_latency		= input.write_latency;
+	elevator->max_bomb_segments	= input.max_bomb_segments;
 	return 0;
 }
 
--- linux-akpm/drivers/block/ll_rw_blk.c~read-latency2	Sun Nov 10 19:53:53 2002
+++ linux-akpm-akpm/drivers/block/ll_rw_blk.c	Sun Nov 10 19:53:53 2002
@@ -432,9 +432,11 @@ static void blk_init_free_list(request_q
 
 	si_meminfo(&si);
 	megs = si.totalram >> (20 - PAGE_SHIFT);
-	nr_requests = 128;
-	if (megs < 32)
-		nr_requests /= 2;
+	nr_requests = (megs * 2) & ~15;	/* One per half-megabyte */
+	if (nr_requests < 32)
+		nr_requests = 32;
+	if (nr_requests > 1024)
+		nr_requests = 1024;
 	blk_grow_request_list(q, nr_requests);
 
 	init_waitqueue_head(&q->wait_for_requests[0]);
--- linux-akpm/include/linux/elevator.h~read-latency2	Sun Nov 10 19:53:53 2002
+++ linux-akpm-akpm/include/linux/elevator.h	Sun Nov 10 19:57:20 2002
@@ -1,12 +1,9 @@
 #ifndef _LINUX_ELEVATOR_H
 #define _LINUX_ELEVATOR_H
 
-typedef void (elevator_fn) (struct request *, elevator_t *,
-			    struct list_head *,
-			    struct list_head *, int);
-
-typedef int (elevator_merge_fn) (request_queue_t *, struct request **, struct list_head *,
-				 struct buffer_head *, int, int);
+typedef int (elevator_merge_fn)(request_queue_t *, struct request **,
+				struct list_head *, struct buffer_head *bh,
+				int rw, int max_sectors);
 
 typedef void (elevator_merge_cleanup_fn) (request_queue_t *, struct request *, int);
 
@@ -16,6 +13,7 @@ struct elevator_s
 {
 	int read_latency;
 	int write_latency;
+	int max_bomb_segments;
 
 	elevator_merge_fn *elevator_merge_fn;
 	elevator_merge_req_fn *elevator_merge_req_fn;
@@ -23,13 +21,13 @@ struct elevator_s
 	unsigned int queue_ID;
 };
 
-int elevator_noop_merge(request_queue_t *, struct request **, struct list_head *, struct buffer_head *, int, int);
-void elevator_noop_merge_cleanup(request_queue_t *, struct request *, int);
-void elevator_noop_merge_req(struct request *, struct request *);
-
-int elevator_linus_merge(request_queue_t *, struct request **, struct list_head *, struct buffer_head *, int, int);
-void elevator_linus_merge_cleanup(request_queue_t *, struct request *, int);
-void elevator_linus_merge_req(struct request *, struct request *);
+elevator_merge_fn		elevator_noop_merge;
+elevator_merge_cleanup_fn	elevator_noop_merge_cleanup;
+elevator_merge_req_fn		elevator_noop_merge_req;
+
+elevator_merge_fn		elevator_linus_merge;
+elevator_merge_cleanup_fn	elevator_linus_merge_cleanup;
+elevator_merge_req_fn		elevator_linus_merge_req;
 
 typedef struct blkelv_ioctl_arg_s {
 	int queue_ID;
@@ -53,22 +51,6 @@ extern void elevator_init(elevator_t *, 
 #define ELEVATOR_FRONT_MERGE	1
 #define ELEVATOR_BACK_MERGE	2
 
-/*
- * This is used in the elevator algorithm.  We don't prioritise reads
- * over writes any more --- although reads are more time-critical than
- * writes, by treating them equally we increase filesystem throughput.
- * This turns out to give better overall performance.  -- sct
- */
-#define IN_ORDER(s1,s2)				\
-	((((s1)->rq_dev == (s2)->rq_dev &&	\
-	   (s1)->sector < (s2)->sector)) ||	\
-	 (s1)->rq_dev < (s2)->rq_dev)
-
-#define BHRQ_IN_ORDER(bh, rq)			\
-	((((bh)->b_rdev == (rq)->rq_dev &&	\
-	   (bh)->b_rsector < (rq)->sector)) ||	\
-	 (bh)->b_rdev < (rq)->rq_dev)
-
 static inline int elevator_request_latency(elevator_t * elevator, int rw)
 {
 	int latency;
@@ -86,7 +68,7 @@ static inline int elevator_request_laten
 ((elevator_t) {								\
 	0,				/* read_latency */		\
 	0,				/* write_latency */		\
-									\
+	0,				/* max_bomb_segments */		\
 	elevator_noop_merge,		/* elevator_merge_fn */		\
 	elevator_noop_merge_req,	/* elevator_merge_req_fn */	\
 	})
@@ -95,7 +77,7 @@ static inline int elevator_request_laten
 ((elevator_t) {								\
 	2048,				/* read passovers */		\
 	8192,				/* write passovers */		\
-									\
++ 	6,				/* max_bomb_segments */		\
 	elevator_linus_merge,		/* elevator_merge_fn */		\
 	elevator_linus_merge_req,	/* elevator_merge_req_fn */	\
 	})

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2002-12-10 10:48 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-12-07  5:20 [BENCHMARK] max bomb segment tuning with read latency 2 patch in contest Con Kolivas
2002-12-07  5:55 ` Andrew Morton
2002-12-07  6:09   ` Con Kolivas
2002-12-07  6:14     ` Andrew Morton
2002-12-07  6:15   ` GrandMasterLee
2002-12-07  6:20   ` GrandMasterLee
2002-12-07  6:45     ` [BENCHMARK] max bomb segment tuning with read latency 2 patchin contest Andrew Morton
2002-12-07 13:29 ` [BENCHMARK] max bomb segment tuning with read latency 2 patch in contest Con Kolivas
2002-12-10 10:50 ` Miquel van Smoorenburg
2002-12-10 10:55   ` Marc-Christian Petersen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox