* [BENCHMARK] max bomb segment tuning with read latency 2 patch in contest
@ 2002-12-07 5:20 Con Kolivas
2002-12-07 5:55 ` Andrew Morton
` (2 more replies)
0 siblings, 3 replies; 10+ messages in thread
From: Con Kolivas @ 2002-12-07 5:20 UTC (permalink / raw)
To: linux kernel mailing list; +Cc: Andrew Morton
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Here are some io_load contest benchmarks with 2.4.20 with the read latency2
patch applied and varying the max bomb segments from 1-6 (SMP used to save
time!)
io_load:
Kernel [runs] Time CPU% Loads LCPU% Ratio
2.4.20 [5] 164.9 45 31 21 4.55
2420rl2b1 [5] 93.5 81 18 22 2.58
2420rl2b2 [5] 88.2 87 16 22 2.44
2420rl2b4 [5] 87.8 84 17 22 2.42
2420rl2b6 [5] 100.3 77 19 22 2.77
io_other:
Kernel [runs] Time CPU% Loads LCPU% Ratio
2.4.20 [5] 89.6 86 17 21 2.47
2420rl2b1 [3] 48.1 156 9 21 1.33
2420rl2b2 [3] 50.0 149 9 21 1.38
2420rl2b4 [5] 51.9 141 10 21 1.43
2420rl2b6 [5] 52.1 142 9 20 1.44
There seems to be a limit to the benefit of decreasing max bomb segments. It
does not seem to have a significant effect on io load on another hard disk
(although read latency2 is overall much better than vanilla).
Con
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.0 (GNU/Linux)
iD8DBQE98YUEF6dfvkL3i1gRAn4kAJ4x414sM3G+8fVrXv2P0huRhNKicgCgqFyo
kCXIKMVtO/Zp+tM92qlUz4s=
=HOKs
-----END PGP SIGNATURE-----
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [BENCHMARK] max bomb segment tuning with read latency 2 patch in contest 2002-12-07 5:20 [BENCHMARK] max bomb segment tuning with read latency 2 patch in contest Con Kolivas @ 2002-12-07 5:55 ` Andrew Morton 2002-12-07 6:09 ` Con Kolivas ` (2 more replies) 2002-12-07 13:29 ` [BENCHMARK] max bomb segment tuning with read latency 2 patch in contest Con Kolivas 2002-12-10 10:50 ` Miquel van Smoorenburg 2 siblings, 3 replies; 10+ messages in thread From: Andrew Morton @ 2002-12-07 5:55 UTC (permalink / raw) To: Con Kolivas; +Cc: linux kernel mailing list Con Kolivas wrote: > > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Here are some io_load contest benchmarks with 2.4.20 with the read latency2 > patch applied and varying the max bomb segments from 1-6 (SMP used to save > time!) > > io_load: > Kernel [runs] Time CPU% Loads LCPU% Ratio > 2.4.20 [5] 164.9 45 31 21 4.55 > 2420rl2b1 [5] 93.5 81 18 22 2.58 > 2420rl2b2 [5] 88.2 87 16 22 2.44 > 2420rl2b4 [5] 87.8 84 17 22 2.42 > 2420rl2b6 [5] 100.3 77 19 22 2.77 If the SMP machine is using scsi then that tends to make the elevator changes less effective. Because the disk sort-of has its own internal elevator which in my testing on a Fujitsu disk has the same ill-advised design as the kernel's elevator: it treats reads and writes in a similar manner. Setting the tag depth to zero helps heaps. But as you're interested in `desktop responsiveness' you should be mostly testing against IDE disks. Their behavour tends to be quite different. If you can turn on write caching on the SCSI disks that would change the picture too. > io_other: > Kernel [runs] Time CPU% Loads LCPU% Ratio > 2.4.20 [5] 89.6 86 17 21 2.47 > 2420rl2b1 [3] 48.1 156 9 21 1.33 > 2420rl2b2 [3] 50.0 149 9 21 1.38 > 2420rl2b4 [5] 51.9 141 10 21 1.43 > 2420rl2b6 [5] 52.1 142 9 20 1.44 > > There seems to be a limit to the benefit of decreasing max bomb segments. It > does not seem to have a significant effect on io load on another hard disk > (although read latency2 is overall much better than vanilla). hm. I'm rather surprised it made much difference at all to io_other, because you shouldn't have competing reads and writes against either disk?? The problem with io_other should be tickling is where `gcc' tries to allocate a page but ends up having to write out someone else's data, and gets stuck sleeping on the disk queue due to the activity of other processes. (This doesn't happen much on a 4G machine, but it'll happen a lot on a 256M machine). But that's a write-latency problem, not a read-latency one. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [BENCHMARK] max bomb segment tuning with read latency 2 patch in contest 2002-12-07 5:55 ` Andrew Morton @ 2002-12-07 6:09 ` Con Kolivas 2002-12-07 6:14 ` Andrew Morton 2002-12-07 6:15 ` GrandMasterLee 2002-12-07 6:20 ` GrandMasterLee 2 siblings, 1 reply; 10+ messages in thread From: Con Kolivas @ 2002-12-07 6:09 UTC (permalink / raw) To: Andrew Morton; +Cc: linux kernel mailing list -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 >Con Kolivas wrote: >> -----BEGIN PGP SIGNED MESSAGE----- >> Hash: SHA1 >> >> Here are some io_load contest benchmarks with 2.4.20 with the read >> latency2 patch applied and varying the max bomb segments from 1-6 (SMP >> used to save time!) >> >> io_load: >> Kernel [runs] Time CPU% Loads LCPU% Ratio >> 2.4.20 [5] 164.9 45 31 21 4.55 >> 2420rl2b1 [5] 93.5 81 18 22 2.58 >> 2420rl2b2 [5] 88.2 87 16 22 2.44 >> 2420rl2b4 [5] 87.8 84 17 22 2.42 >> 2420rl2b6 [5] 100.3 77 19 22 2.77 > >If the SMP machine is using scsi then that tends to make the elevator >changes less effective. Because the disk sort-of has its own internal >elevator which in my testing on a Fujitsu disk has the same ill-advised >design as the kernel's elevator: it treats reads and writes in a similar >manner. These are ide disks, in the same format as those used in the UP machine, so it still should be showing the same effect? I think higher numbers in UP would increase the resolution more for these results - apart from that is there any disadvantage to doing it in SMP? If you think it's worth running them in UP mode I'll do that. > >Setting the tag depth to zero helps heaps. > >But as you're interested in `desktop responsiveness' you should be >mostly testing against IDE disks. Their behavour tends to be quite >different. > >If you can turn on write caching on the SCSI disks that would change >the picture too. > >> io_other: >> Kernel [runs] Time CPU% Loads LCPU% Ratio >> 2.4.20 [5] 89.6 86 17 21 2.47 >> 2420rl2b1 [3] 48.1 156 9 21 1.33 >> 2420rl2b2 [3] 50.0 149 9 21 1.38 >> 2420rl2b4 [5] 51.9 141 10 21 1.43 >> 2420rl2b6 [5] 52.1 142 9 20 1.44 >> >> There seems to be a limit to the benefit of decreasing max bomb segments. >> It does not seem to have a significant effect on io load on another hard >> disk (although read latency2 is overall much better than vanilla). > >hm. I'm rather surprised it made much difference at all to io_other, >because you shouldn't have competing reads and writes against either >disk?? Some of the partitions are mounted on that other disk as well so occasionally it is involved in the kernel compile. /dev/hda8 on / type ext3 (rw) none on /proc type proc (rw) /dev/hda1 on /boot type ext3 (rw) none on /dev/pts type devpts (rw,mode=0620) /dev/hda7 on /home type ext3 (rw) /dev/hda5 on /tmp type ext3 (rw) /dev/hdb5 on /usr type ext3 (rw) /dev/hdb1 on /var type ext3 (rw) The testing is done from /dev/hda7 and io_load writes to /dev/hda7, io_other writes to /dev/hdb1 Unfortunately this is the way the osdl machine was set up for me. I should have been more specific in my requests but I didnt realise they were doing this. There isn't really that much spare space on the two drives to shuffle the partitioning around and contest can use huge amounts of space during testing :\ >The problem with io_other should be tickling is where `gcc' tries to >allocate a page but ends up having to write out someone else's data, >and gets stuck sleeping on the disk queue due to the activity of >other processes. (This doesn't happen much on a 4G machine, but it'll >happen a lot on a 256M machine). > >But that's a write-latency problem, not a read-latency one. Con -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.0 (GNU/Linux) iD8DBQE98ZCsF6dfvkL3i1gRAtAJAKCipF5dOAp2g+ICRuV4xagT/qsvZgCfWhaN eZsoUGwt5RjlGbZJiD+nYZI= =OVHE -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [BENCHMARK] max bomb segment tuning with read latency 2 patch in contest 2002-12-07 6:09 ` Con Kolivas @ 2002-12-07 6:14 ` Andrew Morton 0 siblings, 0 replies; 10+ messages in thread From: Andrew Morton @ 2002-12-07 6:14 UTC (permalink / raw) To: Con Kolivas; +Cc: linux kernel mailing list Con Kolivas wrote: > > ... > >If the SMP machine is using scsi then that tends to make the elevator > >changes less effective. Because the disk sort-of has its own internal > >elevator which in my testing on a Fujitsu disk has the same ill-advised > >design as the kernel's elevator: it treats reads and writes in a similar > >manner. > > These are ide disks, in the same format as those used in the UP machine, so it > still should be showing the same effect? I think higher numbers in UP would > increase the resolution more for these results - apart from that is there any > disadvantage to doing it in SMP? If you think it's worth running them in UP > mode I'll do that. Oh, OK. I was guessing, and guessed wrong. No, I don't expect you'd see much difference switching to UP for those tests which are sensitive to the IO scheduler policy. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [BENCHMARK] max bomb segment tuning with read latency 2 patch in contest 2002-12-07 5:55 ` Andrew Morton 2002-12-07 6:09 ` Con Kolivas @ 2002-12-07 6:15 ` GrandMasterLee 2002-12-07 6:20 ` GrandMasterLee 2 siblings, 0 replies; 10+ messages in thread From: GrandMasterLee @ 2002-12-07 6:15 UTC (permalink / raw) To: Andrew Morton; +Cc: Con Kolivas, linux kernel mailing list On Fri, 2002-12-06 at 23:55, Andrew Morton wrote: [...] > If the SMP machine is using scsi then that tends to make the elevator > changes less effective. Because the disk sort-of has its own internal > elevator which in my testing on a Fujitsu disk has the same ill-advised > design as the kernel's elevator: it treats reads and writes in a similar > manner. > > Setting the tag depth to zero helps heaps. Command tag queue? As in the compile time option? Or do you mean queue depth?(or are they the same) > But as you're interested in `desktop responsiveness' you should be > mostly testing against IDE disks. Their behavour tends to be quite > different. > > If you can turn on write caching on the SCSI disks that would change > the picture too. Just for clarity, What about for something like FC attached storage Where the controllers enforce cache policies on a "per volume" basis? Would that == the same thing? --The GrandMaster ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [BENCHMARK] max bomb segment tuning with read latency 2 patch in contest 2002-12-07 5:55 ` Andrew Morton 2002-12-07 6:09 ` Con Kolivas 2002-12-07 6:15 ` GrandMasterLee @ 2002-12-07 6:20 ` GrandMasterLee 2002-12-07 6:45 ` [BENCHMARK] max bomb segment tuning with read latency 2 patchin contest Andrew Morton 2 siblings, 1 reply; 10+ messages in thread From: GrandMasterLee @ 2002-12-07 6:20 UTC (permalink / raw) To: Andrew Morton; +Cc: Con Kolivas, linux kernel mailing list On Fri, 2002-12-06 at 23:55, Andrew Morton wrote: [...] > If the SMP machine is using scsi then that tends to make the elevator > changes less effective. Because the disk sort-of has its own internal > elevator which in my testing on a Fujitsu disk has the same ill-advised > design as the kernel's elevator: it treats reads and writes in a similar > manner. > > Setting the tag depth to zero helps heaps. > > But as you're interested in `desktop responsiveness' you should be > mostly testing against IDE disks. Their behavour tends to be quite > different. One interesting thing about my current setup, with all scsi or FC disks, is that bomb never displays > 0. Example: elvtune /dev/sdn yields: /dev/sdn elevator ID 17 read_latency: 8192 write_latency: 16384 max_bomb_segments: 0 elvtune -b 6 /dev/sdn yields: /dev/sdn elevator ID 17 read_latency: 8192 write_latency: 16384 max_bomb_segments: 0 Is it because I just do volume management at the hardware level and use whole disks? Or is that something else? ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [BENCHMARK] max bomb segment tuning with read latency 2 patchin contest 2002-12-07 6:20 ` GrandMasterLee @ 2002-12-07 6:45 ` Andrew Morton 0 siblings, 0 replies; 10+ messages in thread From: Andrew Morton @ 2002-12-07 6:45 UTC (permalink / raw) To: GrandMasterLee; +Cc: Con Kolivas, linux kernel mailing list GrandMasterLee wrote: > > ... > One interesting thing about my current setup, with all scsi or FC disks, > is that bomb never displays > 0. > Example: > > elvtune /dev/sdn yields: > > /dev/sdn elevator ID 17 > read_latency: 8192 > write_latency: 16384 > max_bomb_segments: 0 > > elvtune -b 6 /dev/sdn yields: > > /dev/sdn elevator ID 17 > read_latency: 8192 > write_latency: 16384 > max_bomb_segments: 0 > > Is it because I just do volume management at the hardware level and use > whole disks? Or is that something else? You need a patched kernel. max_bomb_segments is some old thing which isn't implemented any more. But I reused it for something completely different in the patch which Con is testing. So I wouldn't have to futz around with patching userspace apps. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [BENCHMARK] max bomb segment tuning with read latency 2 patch in contest 2002-12-07 5:20 [BENCHMARK] max bomb segment tuning with read latency 2 patch in contest Con Kolivas 2002-12-07 5:55 ` Andrew Morton @ 2002-12-07 13:29 ` Con Kolivas 2002-12-10 10:50 ` Miquel van Smoorenburg 2 siblings, 0 replies; 10+ messages in thread From: Con Kolivas @ 2002-12-07 13:29 UTC (permalink / raw) To: linux kernel mailing list; +Cc: Andrew Morton -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 >Here are some io_load contest benchmarks with 2.4.20 with the read latency2 >patch applied and varying the max bomb segments from 1-6 (SMP used to save >time!) > >io_load: >Kernel [runs] Time CPU% Loads LCPU% Ratio >2.4.20 [5] 164.9 45 31 21 4.55 >2420rl2b1 [5] 93.5 81 18 22 2.58 >2420rl2b2 [5] 88.2 87 16 22 2.44 >2420rl2b4 [5] 87.8 84 17 22 2.42 >2420rl2b6 [5] 100.3 77 19 22 2.77 > >io_other: >Kernel [runs] Time CPU% Loads LCPU% Ratio >2.4.20 [5] 89.6 86 17 21 2.47 >2420rl2b1 [3] 48.1 156 9 21 1.33 >2420rl2b2 [3] 50.0 149 9 21 1.38 >2420rl2b4 [5] 51.9 141 10 21 1.43 >2420rl2b6 [5] 52.1 142 9 20 1.44 > >There seems to be a limit to the benefit of decreasing max bomb segments. It >does not seem to have a significant effect on io load on another hard disk >(although read latency2 is overall much better than vanilla). > >Con Further testing with changing values of read and write latencies (with fixed max_bomb to 4) and the read latency 2 patch in place shows no significant change to these figures over a wide range of numbers. This was not the case when I ran contest with different read latency values on the vanilla kernel (and found -r 512 to be a reasonable compromise according to Jens). Is there some other advantage to be gained by say increasing these numbers? (since contest results don't change with higher numbers either) Con -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.0 (GNU/Linux) iD8DBQE98ffKF6dfvkL3i1gRAo01AJ0Zvs0x80vGF1hUillnIL4y+f6xRQCfZyni YkNWPMORdfjRHfG5/6NxV4M= =g1ht -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [BENCHMARK] max bomb segment tuning with read latency 2 patch in contest 2002-12-07 5:20 [BENCHMARK] max bomb segment tuning with read latency 2 patch in contest Con Kolivas 2002-12-07 5:55 ` Andrew Morton 2002-12-07 13:29 ` [BENCHMARK] max bomb segment tuning with read latency 2 patch in contest Con Kolivas @ 2002-12-10 10:50 ` Miquel van Smoorenburg 2002-12-10 10:55 ` Marc-Christian Petersen 2 siblings, 1 reply; 10+ messages in thread From: Miquel van Smoorenburg @ 2002-12-10 10:50 UTC (permalink / raw) To: linux-kernel In article <200212071620.05503.conman@kolivas.net>, Con Kolivas <conman@kolivas.net> wrote: >Here are some io_load contest benchmarks with 2.4.20 with the read latency2 >patch applied Where is the rl2 patch for 2.4.20-vanilla ? Mike. -- They all laughed when I said I wanted to build a joke-telling machine. Well, I showed them! Nobody's laughing *now*! -- acesteves@clix.pt ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [BENCHMARK] max bomb segment tuning with read latency 2 patch in contest 2002-12-10 10:50 ` Miquel van Smoorenburg @ 2002-12-10 10:55 ` Marc-Christian Petersen 0 siblings, 0 replies; 10+ messages in thread From: Marc-Christian Petersen @ 2002-12-10 10:55 UTC (permalink / raw) To: linux-kernel; +Cc: Miquel van Smoorenburg [-- Attachment #1: Type: text/plain, Size: 247 bytes --] On Tuesday 10 December 2002 11:50, Miquel van Smoorenburg wrote: Hi Miquel, > >Here are some io_load contest benchmarks with 2.4.20 with the read > > latency2 patch applied > Where is the rl2 patch for 2.4.20-vanilla ? here. ciao, Marc [-- Attachment #2: read-latency2-2.4.20-vanilla.patch --] [-- Type: text/x-diff, Size: 11707 bytes --] This patch is designed to improves disk read latencies in the presence of heavy write traffic. I'm proposing it for inclusion in 2.4.x. It changes the disk elevator's treatment of read requests. Instead of placing an unmergeable read at the tail of the list, it is placed a tunable distance from the front. That distance is tuned with `elvtune -b N'. After much testing, the default value of N (aka max_bomb_segments) is 6. Increasing max_bomb_segments penalises reads (a lot) and benefits writes (a little). Setting max_bomb_segments to zero disables the feature. There are two other changes here: 1: Currently, if a request's latency in the queue is expired, it becomes a barrier to all newly introduced sectors. With this patch, it becomes a barrier only to the introduction of *new* requests in the queue. Contiguous merges can still bypass an expired request. We still avoid the `infinite latency' problem because when all the requests in front of the expired one are at max_sectors, that's it. No more requests can be introduced in front of the expired one. This change gives improved merging and is worth 10-15% on dbench. 2: The request queues are big again. A minimum of 32 requests and a maximum of 1024. The maximum is reached on machines which have 512 megabytes or more. Rationale: request merging/sorting is the *only* means we have of straightening out unordered requests from the application layer. There are some workloads where this simply collapses. The `random write' tests in iozone and in Juan's misc001 result in the machine being locked up for minutes, trickling stuff out to disk at 500 k/sec. Increasing the request queue size helps here. A bit. I believe the current 128-request limit was introduced in a (not very successful) attempt to reduce read latencies. Well, we don't need to do that now. (-ac kernels still have >1000 requests per queue). It's worth another 10% on dbench. One of the objectives here was to ensure that the tunable actually does something useful. That it gives a good spectrum of control over the write-throughput-versus-read-latency balance. It does that. I'll spare you all the columns of testing numbers. Here's a summary of the performance changes at the default elevator settings: - Linear read throughput in the presence of a linear write is improved by 8x to 10x - Linear read throughput in the presence of seek-intensive writes (yup, dbench) is improved by 5x to 30x - Many-file read throughput (reading a kernel tree) in the presence of a streaming write is increased by 2x to 30x - dbench throughput is increased a little. - the results vary greatly depending upon available memory. Generally but not always, small-memory machines suffer latency more, and are benefitted more. - other benchmarks (iozone, bonnie++, tiobench) are unaltered - they all tend to just do single large writes. On the downside: - linear write throughput in the presence of a large streaming read is halved. - linear write throughput in the presence of ten seek-intensive reading processes (read 10 separate kernel trees in parallel) is 7x lower. - linear write throughput in the presence of one seek-intensive reading process (kernel tree diff) is about 15% lower. One thing which probably needs altering now is the default settings of the elevator read and write latencies. It should be possible to increase these significantly and get more throughput improvements. That's on my todo list. Increasing the VM readahead parameters will probably be an overall win. This is a pretty fundamental change to the kernel. Please test this patch. Not only for its goodness - it has tons of that. Try also to find badness. drivers/block/elevator.c | 85 +++++++++++++++++++++++++++++++++++++++++----- drivers/block/ll_rw_blk.c | 8 ++-- include/linux/elevator.h | 43 ++++++----------------- 3 files changed, 93 insertions(+), 43 deletions(-) --- linux-akpm/drivers/block/elevator.c~read-latency2 Sun Nov 10 19:53:53 2002 +++ linux-akpm-akpm/drivers/block/elevator.c Sun Nov 10 19:59:21 2002 @@ -80,25 +80,38 @@ int elevator_linus_merge(request_queue_t struct buffer_head *bh, int rw, int max_sectors) { - struct list_head *entry = &q->queue_head; - unsigned int count = bh->b_size >> 9, ret = ELEVATOR_NO_MERGE; + struct list_head *entry; + unsigned int count = bh->b_size >> 9; + unsigned int ret = ELEVATOR_NO_MERGE; + int merge_only = 0; + const int max_bomb_segments = q->elevator.max_bomb_segments; struct request *__rq; + int passed_a_read = 0; + + entry = &q->queue_head; while ((entry = entry->prev) != head) { __rq = blkdev_entry_to_request(entry); - /* - * we can't insert beyond a zero sequence point - */ - if (__rq->elevator_sequence <= 0) - break; + if (__rq->elevator_sequence-- <= 0) { + /* + * OK, we've exceeded someone's latency limit. + * But we still continue to look for merges, + * because they're so much better than seeks. + */ + merge_only = 1; + } if (__rq->waiting) continue; if (__rq->rq_dev != bh->b_rdev) continue; - if (!*req && bh_rq_in_between(bh, __rq, &q->queue_head)) + if (!*req && !merge_only && + bh_rq_in_between(bh, __rq, &q->queue_head)) { *req = __rq; + } + if (__rq->cmd != WRITE) + passed_a_read = 1; if (__rq->cmd != rw) continue; if (__rq->nr_sectors + count > max_sectors) @@ -129,6 +142,57 @@ int elevator_linus_merge(request_queue_t } } + /* + * If we failed to merge a read anywhere in the request + * queue, we really don't want to place it at the end + * of the list, behind lots of writes. So place it near + * the front. + * + * We don't want to place it in front of _all_ writes: that + * would create lots of seeking, and isn't tunable. + * We try to avoid promoting this read in front of existing + * reads. + * + * max_bomb_segments becomes the maximum number of write + * requests which we allow to remain in place in front of + * a newly introduced read. We weight things a little bit, + * so large writes are more expensive than small ones, but it's + * requests which count, not sectors. + */ + if (max_bomb_segments && rw == READ && !passed_a_read && + ret == ELEVATOR_NO_MERGE) { + int cur_latency = 0; + struct request * const cur_request = *req; + + entry = head->next; + while (entry != &q->queue_head) { + struct request *__rq; + + if (entry == &q->queue_head) + BUG(); + if (entry == q->queue_head.next && + q->head_active && !q->plugged) + BUG(); + __rq = blkdev_entry_to_request(entry); + + if (__rq == cur_request) { + /* + * This is where the old algorithm placed it. + * There's no point pushing it further back, + * so leave it here, in sorted order. + */ + break; + } + if (__rq->cmd == WRITE) { + cur_latency += 1 + __rq->nr_sectors / 64; + if (cur_latency >= max_bomb_segments) { + *req = __rq; + break; + } + } + entry = entry->next; + } + } return ret; } @@ -186,7 +250,7 @@ int blkelvget_ioctl(elevator_t * elevato output.queue_ID = elevator->queue_ID; output.read_latency = elevator->read_latency; output.write_latency = elevator->write_latency; - output.max_bomb_segments = 0; + output.max_bomb_segments = elevator->max_bomb_segments; if (copy_to_user(arg, &output, sizeof(blkelv_ioctl_arg_t))) return -EFAULT; @@ -205,9 +269,12 @@ int blkelvset_ioctl(elevator_t * elevato return -EINVAL; if (input.write_latency < 0) return -EINVAL; + if (input.max_bomb_segments < 0) + return -EINVAL; elevator->read_latency = input.read_latency; elevator->write_latency = input.write_latency; + elevator->max_bomb_segments = input.max_bomb_segments; return 0; } --- linux-akpm/drivers/block/ll_rw_blk.c~read-latency2 Sun Nov 10 19:53:53 2002 +++ linux-akpm-akpm/drivers/block/ll_rw_blk.c Sun Nov 10 19:53:53 2002 @@ -432,9 +432,11 @@ static void blk_init_free_list(request_q si_meminfo(&si); megs = si.totalram >> (20 - PAGE_SHIFT); - nr_requests = 128; - if (megs < 32) - nr_requests /= 2; + nr_requests = (megs * 2) & ~15; /* One per half-megabyte */ + if (nr_requests < 32) + nr_requests = 32; + if (nr_requests > 1024) + nr_requests = 1024; blk_grow_request_list(q, nr_requests); init_waitqueue_head(&q->wait_for_requests[0]); --- linux-akpm/include/linux/elevator.h~read-latency2 Sun Nov 10 19:53:53 2002 +++ linux-akpm-akpm/include/linux/elevator.h Sun Nov 10 19:57:20 2002 @@ -1,12 +1,9 @@ #ifndef _LINUX_ELEVATOR_H #define _LINUX_ELEVATOR_H -typedef void (elevator_fn) (struct request *, elevator_t *, - struct list_head *, - struct list_head *, int); - -typedef int (elevator_merge_fn) (request_queue_t *, struct request **, struct list_head *, - struct buffer_head *, int, int); +typedef int (elevator_merge_fn)(request_queue_t *, struct request **, + struct list_head *, struct buffer_head *bh, + int rw, int max_sectors); typedef void (elevator_merge_cleanup_fn) (request_queue_t *, struct request *, int); @@ -16,6 +13,7 @@ struct elevator_s { int read_latency; int write_latency; + int max_bomb_segments; elevator_merge_fn *elevator_merge_fn; elevator_merge_req_fn *elevator_merge_req_fn; @@ -23,13 +21,13 @@ struct elevator_s unsigned int queue_ID; }; -int elevator_noop_merge(request_queue_t *, struct request **, struct list_head *, struct buffer_head *, int, int); -void elevator_noop_merge_cleanup(request_queue_t *, struct request *, int); -void elevator_noop_merge_req(struct request *, struct request *); - -int elevator_linus_merge(request_queue_t *, struct request **, struct list_head *, struct buffer_head *, int, int); -void elevator_linus_merge_cleanup(request_queue_t *, struct request *, int); -void elevator_linus_merge_req(struct request *, struct request *); +elevator_merge_fn elevator_noop_merge; +elevator_merge_cleanup_fn elevator_noop_merge_cleanup; +elevator_merge_req_fn elevator_noop_merge_req; + +elevator_merge_fn elevator_linus_merge; +elevator_merge_cleanup_fn elevator_linus_merge_cleanup; +elevator_merge_req_fn elevator_linus_merge_req; typedef struct blkelv_ioctl_arg_s { int queue_ID; @@ -53,22 +51,6 @@ extern void elevator_init(elevator_t *, #define ELEVATOR_FRONT_MERGE 1 #define ELEVATOR_BACK_MERGE 2 -/* - * This is used in the elevator algorithm. We don't prioritise reads - * over writes any more --- although reads are more time-critical than - * writes, by treating them equally we increase filesystem throughput. - * This turns out to give better overall performance. -- sct - */ -#define IN_ORDER(s1,s2) \ - ((((s1)->rq_dev == (s2)->rq_dev && \ - (s1)->sector < (s2)->sector)) || \ - (s1)->rq_dev < (s2)->rq_dev) - -#define BHRQ_IN_ORDER(bh, rq) \ - ((((bh)->b_rdev == (rq)->rq_dev && \ - (bh)->b_rsector < (rq)->sector)) || \ - (bh)->b_rdev < (rq)->rq_dev) - static inline int elevator_request_latency(elevator_t * elevator, int rw) { int latency; @@ -86,7 +68,7 @@ static inline int elevator_request_laten ((elevator_t) { \ 0, /* read_latency */ \ 0, /* write_latency */ \ - \ + 0, /* max_bomb_segments */ \ elevator_noop_merge, /* elevator_merge_fn */ \ elevator_noop_merge_req, /* elevator_merge_req_fn */ \ }) @@ -95,7 +77,7 @@ static inline int elevator_request_laten ((elevator_t) { \ 2048, /* read passovers */ \ 8192, /* write passovers */ \ - \ ++ 6, /* max_bomb_segments */ \ elevator_linus_merge, /* elevator_merge_fn */ \ elevator_linus_merge_req, /* elevator_merge_req_fn */ \ }) ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2002-12-10 10:48 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2002-12-07 5:20 [BENCHMARK] max bomb segment tuning with read latency 2 patch in contest Con Kolivas 2002-12-07 5:55 ` Andrew Morton 2002-12-07 6:09 ` Con Kolivas 2002-12-07 6:14 ` Andrew Morton 2002-12-07 6:15 ` GrandMasterLee 2002-12-07 6:20 ` GrandMasterLee 2002-12-07 6:45 ` [BENCHMARK] max bomb segment tuning with read latency 2 patchin contest Andrew Morton 2002-12-07 13:29 ` [BENCHMARK] max bomb segment tuning with read latency 2 patch in contest Con Kolivas 2002-12-10 10:50 ` Miquel van Smoorenburg 2002-12-10 10:55 ` Marc-Christian Petersen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox